Is there an out-of-memory package like dask for read/write geodataframes?

sinbb · January 21, 2022, 9:03am

Hi
I’m working on a geodataframe with about 4 millions record, my problem is the read/write speed and also memory limit but memory is my second priority.
I was wondering if there is a package like “Dask” that could read/write data fast and also out-of-memory?

Thanks

TomAugspurger · January 21, 2022, 12:07pm

There is GitHub - geopandas/dask-geopandas: New implementation of geopandas + dask. If you write to parquet with .to_parquet() the speed might be acceptable.

There is a proposed spec for more efficiently encoding the geometry column(s), but I don’t know how expensive that is relative to everything else. I suppose in your case you could write it both with and without the geometry column and see what the difference is.

Topic		Replies	Views
Advice for scalable raster-vector extraction? Cloud	24	1330	May 15, 2024
Quartile Calculations on reshaped arrays - using dask + xarray + s3 and still having memory problems Data	2	607	May 31, 2023
Using Dask client and running out of memory Science	8	4122	June 22, 2023
Optimizing Dask worker memory for writing Zarr files from GeoTIFs Data	7	225	September 7, 2024
Wednesday November 9th 2022: GeoPandas: Easy, fast and scalable geospatial analysis in Python Pangeo Showcase	2	1173	November 4, 2022

Is there an out-of-memory package like dask for read/write geodataframes?

Related topics