Dask Execution Issue: Successful in Notebook, Fails in SLURM

Sumanshekhar17 · July 31, 2023, 9:45pm

Hi all,
I’m working on analyzing 4-dimensional oceanographic model (ROMS) data using xarray. I understand that xarray leverages Dask for its lazy computation and chunking capabilities. When I run my analysis script in a Jupyter Notebook session allocated with 60 GB of memory, everything executes smoothly without any issues. However, when I attempt to run the exact same script with an equivalent memory allocation (60 GB) through a Slurm job script, the job gets terminated due to memory errors.

I’ve encountered errors such as TimeoutError, CommClosedError, and Out Of Memory from the Slurm scheduler, indicating that the job was killed due to memory constraints.

I’m puzzled as to why the script runs perfectly in the Jupyter environment but faces memory issues when executed via Slurm, even though the memory allocation is the same in both scenarios. Could there be any underlying differences in how Dask handles memory or computations in these two environments? Any insights or suggestions would be greatly appreciated.

navidcy · August 1, 2023, 4:27pm

Maybe not useful but I noticed GitHub - xoceanmodel/xroms: Work with ROMS ocean model output with xarray but never ysed it nor used Roms. Sorry if this is spam.

ircwaves · August 1, 2023, 4:32pm

How many dask workers are you using? IIRC there is a memory per worker setting for the Dask cluster. If not specified, it might try to use all the resources it can see, not just the ones SLURM allows. (Caveat: I’m assuming SLURM is similar to TORQuePBS in its resource management).

Sumanshekhar17 · August 1, 2023, 6:15pm

Thank you for your reply, @navidcy. I’m utilizing the xroms package, which facilitates reading ROMS outputs more effectively and aids in post-processing the data. However, my primary challenge revolves around the proper use of Dask. From my understanding, xarray employs Dask under the hood. This leads me to question whether I need to provide explicit instructions to Dask to create a local cluster, or if xarray handles this automatically. @ircwaves, this is the reason I haven’t explicitly defined a Dask client or local cluster in my code.

dcherian · August 1, 2023, 7:43pm

@Sumanshekhar17 this is going to be hard to answer without looking at code. Can you put your notebook on a Github Gist?

navidcy · August 1, 2023, 8:16pm

My general experience w dask and xarray is that it works ok up to the point that it works and then it does require you to tweak and think of problem-specific solutions depending on the specific computation, how your data is chunked, etc.

Sumanshekhar17 · August 1, 2023, 8:58pm

Hi Deepak, here is the code- Doubt in Dask · GitHub

Topic		Replies	Views
Very big memory load when using fast parallel file system HPC	11	1793	November 13, 2019
Xarray/Dask - Specify that a given task use huge amount of RAM to the Dask Ressource Manager Science	2	599	October 19, 2022
Best practice advice on parallel processing a suite of zarr files with Dask and xarray Data	10	165	June 18, 2025
Optimizing climatology calculation with Xarray and Dask Science	33	3948	December 6, 2024
Xarray unable to allocate memory, How to "size up" problem Data location-uw	9	3004	July 27, 2023

Dask Execution Issue: Successful in Notebook, Fails in SLURM

Related topics