Optimizing climatology calculation with Xarray and Dask

Update on the rechunked version

Rechunking Configuration

import zarr
import os
from rechunker import rechunk
import gcsfs

url = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly.zarr"
zg = zarr.open_group(url)

fs = gcsfs.GCSFileSystem(skip_instance_cache=True, use_listings_cache=False)

temp_path = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly_rechunk/temp.zarr"
target_path = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly_rechunk/target.zarr"

temp_store = zarr.storage.FSStore(temp_path)
target_store = zarr.storage.FSStore(target_path)

target_chunks = {
    'tp': (7305, 103, 10),
    'time': None,
    'longitude': None,
    'latitude': None,
}

max_mem = '8GB'

r = rechunk(zg, target_chunks, max_mem, target_store, temp_store=temp_store)

# done with a 20 worker dask cluster w/ 40 GB memory each
r.execute()

This takes a long time (like an hour), but it is pretty stable and reliable.

Now do flox groupby with map-reduce

I open the data and specify even longer chunks in time such that the time axis is totally contiguous.

target_path = f"{os.environ['SCRATCH_BUCKET']}/ERA5_HiRes_Hourly_rechunk/target.zarr"
dsr = xr.open_dataset(
    target_path, engine="zarr", consolidated=False,
    chunks={'time': -1, 'latitude': 103, 'longitude': 10}
)

The groupby now executes flawlessly with map-reduce.

method = "map-reduce"
tpmr = flox.xarray.xarray_reduce(dsr.tp, dsr.time.dt.hour, 
                                func="mean", 
                                method=method)

I used the same configuration with 40GB of memory per worker, but the memory usage never rose above about 10GB per worker, meaning I could have got by much cheaper

This is what my dask task stream looked like. It took about 4 minutes to do 678 GB of data. :sunglasses:

So I think this illustrates two key takeaways.

Neither of these packages existed two years ago. I am tempted to say we have nailed it. :rocket: However, I think there is more work to do to make rechunker faster and more memory efficient.

@fmaussion - you should try rechunking the data on your HPC system and see how it goes.

1 Like