Update on the rechunked version
Rechunking Configuration
import zarr
import os
from rechunker import rechunk
import gcsfs
url = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly.zarr"
zg = zarr.open_group(url)
fs = gcsfs.GCSFileSystem(skip_instance_cache=True, use_listings_cache=False)
temp_path = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly_rechunk/temp.zarr"
target_path = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly_rechunk/target.zarr"
temp_store = zarr.storage.FSStore(temp_path)
target_store = zarr.storage.FSStore(target_path)
target_chunks = {
'tp': (7305, 103, 10),
'time': None,
'longitude': None,
'latitude': None,
}
max_mem = '8GB'
r = rechunk(zg, target_chunks, max_mem, target_store, temp_store=temp_store)
# done with a 20 worker dask cluster w/ 40 GB memory each
r.execute()
This takes a long time (like an hour), but it is pretty stable and reliable.
Now do flox groupby with map-reduce
I open the data and specify even longer chunks in time such that the time axis is totally contiguous.
target_path = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly_rechunk/target.zarr"
dsr = xr.open_dataset(
target_path, engine="zarr", consolidated=False,
chunks={'time': -1, 'latitude': 103, 'longitude': 10}
)
The groupby now executes flawlessly with map-reduce.
method = "map-reduce"
tpmr = flox.xarray.xarray_reduce(dsr.tp, dsr.time.dt.hour,
func="mean",
method=method)
I used the same configuration with 40GB of memory per worker, but the memory usage never rose above about 10GB per worker, meaning I could have got by much cheaper
This is what my dask task stream looked like. It took about 4 minutes to do 678 GB of data.
So I think this illustrates two key takeaways.
- Rechunking is really, really helpful for solving these performance. That was the outcome of the epic discussion in Best practices to go from 1000s of netcdf files to analyses on a HPC cluster? which led to the creation of the
rechunker
pakage. - @dcherian’s flox package performs flawlessly when given properly chunked data.
Neither of these packages existed two years ago. I am tempted to say we have nailed it. However, I think there is more work to do to make rechunker faster and more memory efficient.
@fmaussion - you should try rechunking the data on your HPC system and see how it goes.