Hi there, thanks very much for all of your efforts, hoping you can help me with an issue I’m encountering in this space. I’m not sure if the issue I’m having is with Zarr or Dask exactly so apologies if I’m misunderstanding something.
I have a Zarr dataset (WRF model output converted from NetCDF) , this data is around ~15TB in size in total, made up initially of daily netCDF files which contain hourly data, overall there are around 150 variables hence it’s a large data set. Allowing access through this method would be hugely beneficial for making the data more widely available to individuals.
The issue I’m having however is formatting the data in such a way to make individual long/lats accessible e.g “for this specific lat/long, return this variable as a time series for the 20 year period”.
What I’ve tried so far;
- Converting the entire data set into Zarr (appending 1 year at a time) with some basic chunking for size-consideration.
- Using rechunker to rechunk e.g one variable across the entire time series with values of south_north and west_east values set to produce reasonable chunk size.
In each case when I access the data and try to filter on the relevant data by doing something like the following (note I’m using a Dask cluster with ~10 workers), and ensuring that I’m using
catalogue = intake.open_catalog(’./catalogue.yaml’)
wrf_data = catalogue.wrf_2001_2019.to_dask()
wrf_t2_point = wrf_data.T2.where(((wrf_data.XLONG == wrflon) & (wrf_data.XLAT == wrflat)), drop=True)
This takes a large amount of time, specifically I can see that initially the Dask workers do all of the get-items operations, however then the local machine then works for ~half an hour with 100% CPU and memory presumably to evaluate all of the chunks before creating the filtered result.
I have a feeling I’m really misunderstanding how to do these things, what I want in the end is simply a data frame type object I can work with an visualise locally which should be a tiny amount of raw data.
Any tips or points in the right direction would be really appreciated.