How to save out-of-memory Xarray file to .csv

Thanks for posting this interesting question.

I’m not convinced that this is actually the exact data you want. Based on your description, it looks like you want a dataset with 1121 different points in it (and 31411 timesteps); i.e. a 2D array. However, here you have a 3D array (31411 x 1121 x 1121); a subsample of your original datacube. This is 1121 times bigger than what you want, which could be why you are running out of memory.

You should probably be using Xarray’s “vectorized indexing” here:

For example, something like

lat_index = xr.DataArray(gem_df.lat, dims="point")
lon_index = xr.DataArray(gem_df.lon, dims="point")
data = xarray_data_ssp126.sel(lon=lon_index, lat=lat_index, method="nearest")

I’m still not certain that will work properly with Dask. But at least it should be the right data structure.

Try this out and report back here!

1 Like