Dask Execution Issue: Successful in Notebook, Fails in SLURM

Hi all,
I’m working on analyzing 4-dimensional oceanographic model (ROMS) data using xarray. I understand that xarray leverages Dask for its lazy computation and chunking capabilities. When I run my analysis script in a Jupyter Notebook session allocated with 60 GB of memory, everything executes smoothly without any issues. However, when I attempt to run the exact same script with an equivalent memory allocation (60 GB) through a Slurm job script, the job gets terminated due to memory errors.

I’ve encountered errors such as TimeoutError, CommClosedError, and Out Of Memory from the Slurm scheduler, indicating that the job was killed due to memory constraints.

I’m puzzled as to why the script runs perfectly in the Jupyter environment but faces memory issues when executed via Slurm, even though the memory allocation is the same in both scenarios. Could there be any underlying differences in how Dask handles memory or computations in these two environments? Any insights or suggestions would be greatly appreciated.

Maybe not useful but I noticed GitHub - xoceanmodel/xroms: Work with ROMS ocean model output with xarray but never ysed it nor used Roms. Sorry if this is spam.

1 Like

How many dask workers are you using? IIRC there is a memory per worker setting for the Dask cluster. If not specified, it might try to use all the resources it can see, not just the ones SLURM allows. (Caveat: I’m assuming SLURM is similar to TORQuePBS in its resource management).

1 Like

Thank you for your reply, @navidcy. I’m utilizing the xroms package, which facilitates reading ROMS outputs more effectively and aids in post-processing the data. However, my primary challenge revolves around the proper use of Dask. From my understanding, xarray employs Dask under the hood. This leads me to question whether I need to provide explicit instructions to Dask to create a local cluster, or if xarray handles this automatically. @ircwaves, this is the reason I haven’t explicitly defined a Dask client or local cluster in my code.

@Sumanshekhar17 this is going to be hard to answer without looking at code. Can you put your notebook on a Github Gist?

1 Like

My general experience w dask and xarray is that it works ok up to the point that it works and then it does require you to tweak and think of problem-specific solutions depending on the specific computation, how your data is chunked, etc.

1 Like

Hi Deepak, here is the code- Doubt in Dask · GitHub