Why speed is slower when use the cluster?

lei · August 10, 2022, 1:27am

Hi,
I notice that the data processing speed is slower using the GatewayCluster compared to not using it.

Pleasing see:MITgcm ECCOv4 Example — Pangeo Gallery documentation

Ryan Abernathey was intend to compare the speed difference between using cluster or not using it. It should speed up when use cluster as said by Ryan in that notebook. But result is not. So, why?

Thanks

geynard · August 23, 2022, 5:43pm

Hi @lei,

Could you be more precise on the cell you’re comparing in the notebook?

If you’re talking about %time ds.THETA.isel(k=0).mean(dim='time').load(), then this is probably because the selected data is so small that using a Dask Gateway cluster just add some overhead.

Maybe the notebook is misleading about that.

lei · August 24, 2022, 2:05am

Thanks very much, @geynard

Yes. I was talking about the %time ds.THETA.isel(k=0).mean(dim='time').load().
When calculate the mean sst in cell 9 and 10 not using the cluster, the time is about 15s. And in cell 13 using the cluster, the time is about 17s.

I also tried some other calculation (Yes. Small data set as you mentioned). The result shows that the cluster dose not speed up the calculation.

geynard · August 24, 2022, 12:10pm

I really don’t know this dataset and its variables, neither the computations done but the first mean() we talked about.

But cluster will probably help if you compute things using all the k dimension.

ThomasMGeo · August 24, 2022, 2:28pm

I would try with a dataset that’s ~10x larger and see what the results are.

Topic		Replies	Views
Dask cluster stays idle for a long time before computing Pangeo Cloud Support	2	737	September 19, 2022
Xarray slow read on cluster Data machine-learning	4	199	November 3, 2024
Understanding Async	6	3016	December 15, 2020
Cloud Optimized Geotiffs + Pangeo best practices Data	4	2081	January 21, 2021
Moving ´larger` data into a Dask session HPC	1	485	February 26, 2022

Why speed is slower when use the cluster?

Related topics