For reference, I’ve also asked the same question at
and in the coiled slack page. I will update this discussion in case I have a positive answer from those places.
I’m using dask.distributed and a coiled cluster to open with xarray multiple NetCDF files. Nothing fancy here.
But there is also a preprocess_xarray function running during the opening of files. This preprocess_xarray function is completely “empty” in my 1st showcase example. However, all the NetCDF files are being downloaded to the machine running the cluster (I’m monitoring my network usage). If this function had stuff in it, It would use my local memory …
I’ve tried many things, adding a @delayed decorator to force this function to run on the cluster… and so on with no success.
I have a reproducible example here code to showcase issues with dask distributed running preprocess on local machine instead of cluster · GitHub . this is a very simplified version of a bigger script available at GitHub - aodn/aodn_cloud_optimised: Cloud optimised data formats
Also the reason why the preprocess_xarray is not part of the class in my example is because that function can’t be serialized and used with partial.
Here is another version of that gist, but slightly different. The preprocess function is creating a dummy variable, and all of the memory of my local machine is being used
Cheers,
Loz