Optimizing climatology calculation with Xarray and Dask

rabernat · May 12, 2022, 1:21pm

Nice Deepak! I tried again with map-reduce using my original 20 year data in chunks of {'time: 24}. I opened it with larger chunks {'time': 480} , which seemed to make a big dfference.

This ran in 2:30 on my 20 node dask cluster with 40GB of memory per worker! That seems amazingly fast! As you can see, the dask cluster is pretty happy.

However, you can also see that there are plenty of memory spikes reaching up to 16GB or more. I was very happy to have the memory headroom of 40GB per worker. In my experience, once the workers feel memory constraints and start spilling to disk, it’s over.

on flox. This was my first time really doing a deep dive. It’s remarkable.

Topic		Replies	Views
Struggling with large dataset loading/reading using xarray Science	39	15201	February 16, 2023
Xarray unable to allocate memory, How to "size up" problem Data location-uw	9	3019	July 27, 2023
Very big memory load when using fast parallel file system HPC	11	1799	November 13, 2019
Xarray loading data locally when Dask is distributed Data	3	511	February 24, 2022
Reading larger than memory HDF data and writing concatenated xarray (or Zarr) dataset on HPC Data	13	2403	October 8, 2020

Optimizing climatology calculation with Xarray and Dask

Related topics