Hi all!
I am working in a cluster with 3 nodes/machines. Each node/machine has its own cores, CPUS and RAM memory. I am trying to run a script which requires some parallel computing in order to open the files that I want to use. Therefore, in the first machine I am setting up a dask.distributed and client, in order to perform the parallel computing across its cores, like this:
Set up the dask scheduler
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster)
client = Client(threads_per_worker=2, n_workers=16,processes=True)
print(client.dashboard_link)
Now open the files with parallel computing
chunks = {'time_counter':1,'y':3057,'x':4321}
dataset = xr.open_dataset('/home/files/ATM_FLUXES/HF_1995_2D.nc',chunks=chunks,engine='h5netcdf')
mld = dataset.somxlt02.isel(y=slice(lat1,lat2),x=slice(lon1,lon2))
MLD = mld.compute()
Then I run the rest of my program normally in the 1st machine.
Once I try to run the same script (but with different lon and lats) by setting up a dask.distributed and client as described above, in the 2nd machine/node I notice two things:
- The script running on the 1st machine is slowing down
- The script running on the 2nd machine is slowing down and almost every time clashes or breaks down eventually.
The datasets that I am handling are the same in the 2 nodes, the python libraries are the same, I can connect to each machine with SSH without passwords (and I have to connect to each machine manually as I want to control the lons and lats of my script every time).
Basically, I am trying to run the same script for different regions of the global ocean. And I would like to be able to run my script (and therefore perform the required parallel computing) of each region in a different node.
Any ideas as to how I can run the same script and the dask.distributed and client in different nodes without crashing everything?
Kind regards,
Sofi