Workflow suggestions with xarray/xclim and Dask cluster.adapt()

jalder · March 16, 2026, 11:17pm

Hi all, I do some heavy processing with the xclim xarray package, but I’ve always been stumped on being able to use Dask cluster.adapt to scale up or down based on HPC availability. The xclim package allows me to parallelize over 1000+ CPUs, but I very often want to process a list of climate models. If I try to use Dask cluster.adapt(), it will ramp up the number of workers in the cluster, but once the currently processing model is complete, all the workers are released. In this situation you’re spending all the time going in and out of the HPC queue. At it’s core, I’m trying to do something highly parallel (xclim) inside a serial for loop (ie processing one climate model at a time). Trying to do everything fully in parallel likely explodes the task graph, so there needs to be some enforced serial processing.

I was wondering if anyone had some tips on how to approach this kind of workflow, while also being able to take advantage of cluster.adapt(). Perhaps there is a way to map over a Dask DataFrame with only one partition so the outer loop is serial, but doesn’t release the workers. A Slurm job array is a great way to process one model at a time using highly parallel xarray tasks, but I was wondering if it could easily be done with Dask cluster.adapt() so you could stay within a Jupyter Notebook. I usually use Dask cluster.scale() for these types of large jobs, but then you need to have a sense of how much wallclock time to request. Dask cluster.adapt() would let you run indefinitely and cycle workers in and out based on HPC availability, which sounds appealing.

Thanks

maawoo · March 17, 2026, 10:02am

Hi @jalder,

I am not sure about your workflow, but from the last bit of your description it sounds like this simple wrapper of dask_jobqueue.SLURMCluster I developed might be helpful:

GitHub - maawoo/draco: A custom wrapper around dask-jobqueue's SLURMCluster. · GitHub

It is used by me and M.Sc.-level students on our university’s HPC system, but hopefully works as-is on your slurm-managed HPC as well (feedback / PRs are welcome!). My motivation was to make it easier for the students to work interactively with Jupyter notebooks on the HPC system. We use micromamba / Pixi to manage our Python environments and access the HPC system via VSCode / Positron and Remote-SSH. Slurm resources can then be requested in the first cell of a notebook, e.g.:

from draco import start_slurm_cluster

dask_client, cluster = start_slurm_cluster()

By default 1.5 h walltime on the ‘short’ queue (3h max walltime on our system) is targeted and uses cluster.adapt(…) to keep requesting resources.

Topic		Replies	Views
Best practice advice on parallel processing a suite of zarr files with Dask and xarray Data	10	665	June 18, 2025
Dask Execution Issue: Successful in Notebook, Fails in SLURM HPC	6	487	August 1, 2023
Issues getting started with Xarray and Dask on Pangeo Cloud	9	2109	February 8, 2021
Optimizing climatology calculation with Xarray and Dask Science	33	4552	December 6, 2024
Xarray unable to allocate memory, How to "size up" problem Data location-uw	9	3504	July 27, 2023

Workflow suggestions with xarray/xclim and Dask cluster.adapt()

Related topics