Memory Error using xvec.zonal_stats

mlummus · May 19, 2025, 9:40pm

Hello! I’m trying to use xvec zonal statistics to calculate the mean snow fraction for each month in a 20 year timespan in 934 hydrologic basin geometries. My workstation has 64 GB of memory. The main dataset I am using is a 81 GB zarr file with coordinates of time, x, and y and a variable called snow_fraction. The snow_fraction is chunked as (252, 1000, 1000) and each chunk is 0.94 GB. The dataset is 96 chunks in 3 graph layers.

I then use xvec.zonal_stats to find the mean snow fraction in each basin. The result is a dataset that has chunksize (geometry: 1, time: 252). This dramatically increases the number of graph layers to 8410.

I rechunk the aggregated mean into one big chunk (geometry: 934, time: 252). Not sure if this was necessary, but the chunks were so small and I thought it would be more efficient to increase chunk size.

Then I encode the dataset as recommended in the xvec reading and writing files tutorial and try to save. The final encoded dataset is 919 kiB. I get a memory error when I try to save it out as a .nc file.

encoded.to_netcdf(path = "F:/EnDI/EnDI_data/snow_data/snow_fraction.nc", mode = "w")

MemoryError: Unable to allocate 2.75 MiB for an array with shape (16, 150, 300) and data type float32

I’m not sure why I am getting this memory error when the final dataset that I am trying to save is so small. Is there something else I should have done to make the zonal statistics step more memory safe?

Here is a link to my notebook

mlummus · May 19, 2025, 9:45pm

This is the snow_fraction dataset for those that are curious.

Here are the xvec tutorials that I am working from: zonal_stats and reading and writing files.

Topic		Replies	Views
Extremely slow rechunking of Zarr store with xarray Data	16	4102	October 22, 2021
Memory requirements tor converting a netcdf multifile dataset to zarr Data	3	843	May 18, 2022
Super large Zarr array limits	1	593	December 26, 2021
Feedback on Zarr performance benchmarking HPC	1	1172	July 16, 2020
High resolution time series; open_zarr question Science	3	1112	July 2, 2020

Memory Error using xvec.zonal_stats

Related topics