I’m perplexed. I’ve downloaded the zipped file geodatabases for Defra/RPA’s CROME data for years 2016 through 2021. You don’t get a choice in the download format if requesting the whole dataset (ArcGIS Web Application).
I have a geodataframe comprising the field boundaries for a farm (land_covers
). I’m using
import geopandas as gp
gp.read_file(pathspec, engine='pyogrio', bbox=tuple(land_covers.total_bounds.tolist()))
# same above also trying engine='fiona'
# or
gp.read_file(pathspec, engine='fiona', mask=land_covers)
where the pathspec for 2021 is
'zip:///home/guy/data/Agreed/gov_environment/CROME/RPA_CropMapOfEngland2021_FGDB_Full.zip!data.gdb'
and for 2020 is
'zip:///home/guy/data/Agreed/gov_environment/CROME/RPA_CropMapOfEngland2020_FGDB_Full.zip!FGDB.gdb'
The only, repeat only, difference I can readily see is within the zipfile, for 2021 there are data.gdb
and docs
directories, whereas for 2020 (and earlier years, in fact), there is only a FGDB.gdb
directory. But if I replace the bbox/mask param for a simple rows=10
, I get 10 rows. It’s only the bbox and/or mask params that fail to find any data.
For 2021, I get the desired intersection. For 2020, I get
/home/guy/anaconda3/envs/geocube/lib/python3.10/site-packages/pyogrio/raw.py:120: UserWarning: Layer 'b'Crop_Map_Of_England_2020'' does not have any features to read
result = ogr_read(
All data are in EPSG:27700, as per
from pyogrio import read_info
read_info(pathspec)
and get
{'crs': 'EPSG:27700',
'encoding': 'UTF-8',
'fields': array(['prob', 'county', 'cromeid', 'lucode', 'shape_Length',
'shape_Area'], dtype=object),
'dtypes': array(['float64', 'object', 'object', 'object', 'float64', 'float64'],
dtype=object),
'geometry_type': 'MultiPolygon',
'features': 31436011,
'capabilities': {'random_read': 1,
'fast_set_next_by_index': 1,
'fast_spatial_filter': 1}}
where you can also see that pyogrio
’s read_info
function reports there to definitely be some features present!
Both the pyogrio and fiona I’m using support the bbox param, but only my fiona supports the mask param. None of them work. So I seem to have features present. I definitely have features. I can read them in and get a geodataframe. They’re in the same CRS as my field geodataframe as well. I’ve tried unzipping and reading from the uncompressed location. I’ve tried creating a symlink data.gdb
to point to the FGDB.gdb
just in case something funky is happening with assumed locations within the directory structure.
My point of entry is obviously geopandas, but that relies on either pyogrio or fiona, and neither of them work. And I don’t know whether it’s ultimately some lower level dependency they have that’s failing or whether they’re failing for the same problem with the data. But I can’t think what to look for next. As I say, I can read in a sample of rows from the data. I can’t think what the data problem might be; the paths within the zip archives changed prior to 2021, but even they work when not using the bbox or mask param. They’re all in the same CRS etc etc.
I know I’m not providing a reproducible example here, but based on the above, where should I focus my attention? I’m in the classic “it should work, but it doesn’t” hole.