Managing Parallel Dask Graphs

Tman · March 27, 2026, 3:42pm

Hi all!

I’ve been working with @darothen and the Brightband team on ExtremeWeatherBench (EWB) for a bit over a year now. We’ve done a lot of great work and I’ve been able to really dial in parts of the code seeing years of discussion and (mostly) solved edge cases around Pangeo and Github.

There are a few critical tasks for us as we prepare a paper on the outputs, including a few leftover bugs post-AMS, but there’s been one overarching challenge that I personally am trying to accomplish for EWB, which is making the processing infrastructure as robust as possible. What I mean by that is, I want to make sure that the backend of EWB to go from datasets → evaluation results can balance memory, compute, i/o, and speed based on the machine used. It’s fun to throw it into a 1.5TB, 196 core GCP behemoth, but I also want to make sure it’s as efficient as possible on a Macbook Air, HPC, etc.

With that, we’ve used joblib to be the main interface for parallelization between distinct cases with some numba tooling in the background for derivations like MLCAPE and IVT. joblib is wonderful, but there’s some limitations we’ve had to build around (e.g. joblib’s workers do not release memory in an attempt to be more efficient during a delayed call).

Our workflow is essentially as follows:

A user has a forecast dataset, ideally a zarr, locally or on a cloud store they can load
They choose a target such as GHCNh data
They use our events metadata which includes time and location for each case
They then choose what metrics they want to use to evaluate
Finally, they run the evaluation and get a resulting table of metrics broken down by initialization time, lead time, forecast source, target source, etc.

So, where the parallelization occurs is how many cases simultaneously get processed. As you can imagine, increasing the number of joblib jobs will rapidly increase memory usage and clog i/o, especially outside of a data center environment. I’m interested to get opinions from the community if there’s any nuance or approach here that may be even more efficient. Has anyone else successfully managed building an approach with independent parallel dask graphs?

Topic		Replies	Views
Collecting problematic workloads	6	627	November 1, 2023
Best practice advice on parallel processing a suite of zarr files with Dask and xarray Data	10	688	June 18, 2025
Very big memory load when using fast parallel file system HPC	11	2033	November 13, 2019
Using Dask to parallelize plotting	19	5867	February 27, 2024
Parallel processing when using numpy Data	17	4769	March 28, 2022

Managing Parallel Dask Graphs

Related topics