Hello! I’m trying to use xvec zonal statistics to calculate the mean snow fraction for each month in a 20 year timespan in 934 hydrologic basin geometries. My workstation has 64 GB of memory. The main dataset I am using is a 81 GB zarr file with coordinates of time, x, and y and a variable called snow_fraction. The snow_fraction is chunked as (252, 1000, 1000) and each chunk is 0.94 GB. The dataset is 96 chunks in 3 graph layers.
I then use xvec.zonal_stats to find the mean snow fraction in each basin. The result is a dataset that has chunksize (geometry: 1, time: 252). This dramatically increases the number of graph layers to 8410.
I rechunk the aggregated mean into one big chunk (geometry: 934, time: 252). Not sure if this was necessary, but the chunks were so small and I thought it would be more efficient to increase chunk size.
Then I encode the dataset as recommended in the xvec reading and writing files tutorial and try to save. The final encoded dataset is 919 kiB. I get a memory error when I try to save it out as a .nc file.
encoded.to_netcdf(path = "F:/EnDI/EnDI_data/snow_data/snow_fraction.nc", mode = "w")
MemoryError: Unable to allocate 2.75 MiB for an array with shape (16, 150, 300) and data type float32
I’m not sure why I am getting this memory error when the final dataset that I am trying to save is so small. Is there something else I should have done to make the zonal statistics step more memory safe?