Why speed is slower when use the cluster?

Hi,
I notice that the data processing speed is slower using the GatewayCluster compared to not using it.

Pleasing see:MITgcm ECCOv4 Example — Pangeo Gallery documentation

Ryan Abernathey was intend to compare the speed difference between using cluster or not using it. It should speed up when use cluster as said by Ryan in that notebook. But result is not. So, why?

Thanks

1 Like

Hi @lei,

Could you be more precise on the cell you’re comparing in the notebook?

If you’re talking about %time ds.THETA.isel(k=0).mean(dim='time').load(), then this is probably because the selected data is so small that using a Dask Gateway cluster just add some overhead.

Maybe the notebook is misleading about that.

2 Likes

Thanks very much, @geynard

Yes. I was talking about the %time ds.THETA.isel(k=0).mean(dim='time').load().
When calculate the mean sst in cell 9 and 10 not using the cluster, the time is about 15s. And in cell 13 using the cluster, the time is about 17s.

I also tried some other calculation (Yes. Small data set as you mentioned). The result shows that the cluster dose not speed up the calculation.

I really don’t know this dataset and its variables, neither the computations done but the first mean() we talked about.

But cluster will probably help if you compute things using all the k dimension.

I would try with a dataset that’s ~10x larger and see what the results are.