Hi all! I am a physical oceanographer and new in python and I recently watched a tutorial about minimizing the processing time of computationally expensive data (If I understood it correctly) by using dask= “parallelized”.

My issue here is that I am dealing with 4D (sliced) files of dimensions:[time,depth,lat,lon] = [365,32,56,48]. I discovered 2 ways of handling the data:

A) The “lazy” xarray reading of data like this:

temp= xr.open_dataset(’/home/directory/T_2011_2D.nc’)[‘votemper’][:,:,:,:] , where ipython reads the file instantly but later on when I want to use temp e.g. in a for loop, it takes ages.

B) The very slow xarray reading of data like this:

temp[:,:,:,:]= xr.open_dataset(’/home/directory/T_2011_2D.nc’)[‘votemper’][:,:,:,:], where ipython takes too much time to open/read/load the temperature values but then, processing temp e.g in a for loop, takes only a minute.

Now I am not sure if there is a solution to this problem but recently I learned about the dask=“parallelized” function and some high-order programming that could potentially offer a compromise in the time that ipython takes to read & process the large dataset (because I have to repeat the same procedure for 7 other variables of the same dimension and throughout many boxes in the ocean).

The example of for loop is like this:

mld2d = np.asarray(MLD) ##make it a numpy array

depths2 = np.asarray(depths)

depths2d = np.tile(depths2[:,None,None],(1,len(latp[:])+2,len(lonp[:])+2)) ## 3D (depth,lat,lon)

idx2dT = np.nan*np.zeros(temp[:,0,:,:].shape)
idx2d = np.nan*np.zeros(mld2d[:,:,:].shape)

depthsmask = np.nan*np.zeros(depths2d[:,:,:].shape)

depthsmask = depths2d[:,:,:]*maskfile2[:,:,:]

for day in range(0,len(time)):

for j in range(0,len(latp)):

for i in range(0,len(lonp)):

if np.isnan(mld[day,j,i]):

idx2dT[day,j,i] = np.nan

else:

idx2dT[day,j,i] = np.abs(depthsmask[:,j,i]-mld2d[day,j,i]).argmin()

Can anyone help me with this?

Thank you in advance for your time and help,

Kind regards,

Sofi