When I start the local cluster, I can specify n_workers
and thread_per_worker
.
Is there a good rule of thumb or procedure for how to set this optimally and test whether I’ve set it optimally based on my system information (e.g. number of cpus, memory, etc.)?
I have read a lot of documentation and stackoverflow, but I can’t seem to find a clear, clean explanation that I understand and can explain clearly to students in my Climate Data class.
I am hoping someone here can help. Thanks!
3 Likes
Thanks for sharing this interesting question!
In my experience, distributed’s LocalCluster
with no options automatically chooses these parameters in a more-or-less optimal way. Are you finding problems with these defaults?
The general constraints are:
- The total amount of memory in the cluster should not exceed the total amount of memory on your computer (and really should be less, since you presumably have other applications running)
- The total number of cores should not exceed the number of cores in your computer. Most laptops only have 4 cores or less, so there is not much wiggle room here. Total cores is
n_workers * threads_per_worker
.
Within those constrains, you can play with different numbers of workers. You can also have just one worker with multiple threads. Threads can share memory while processes (and workers) cannot. I have never know how to decide between processes and threads.
I personally tend not to use the distributed schedule on a local machine, except when I need the dashboard to debug something. The default threaded scheduler seems to work just as well if not better. There is overhead involved in running the distributed scheduler.
Thanks for the reply @rabernat. I’m working on my department linux cluster. For some reason I had the impression that I was supposed to specify and set threads and workers in some optimal way or I would not get my code distributed across all the available cores by default. Seems I misunderstood. I will do some testing.
https://docs.dask.org/en/latest/setup/single-machine.html has a short discussion on choosing between a mix of threads and processes. Typically it comes down to whether or not your computation holds Python’s global interpreter lock.
2 Likes
Another consideration is that HDF5 cannot parallelize across threads. Since I work with large collections of netCDF files, and file reads are usually the slowest part of my workflow, I always go with multiple processes.
1 Like