Compute time series for 70,000 locations (Speed up the processing)

maawoo · August 26, 2024, 3:35pm

I’ve had good experiences with Dask’s P2P shuffling to reduce RAM usage. Might be worth trying by adjusting the config:

import dask

dask.config.set({"array.rechunk.method": "p2p"})
dask.config.set({"optimization.fuse.active": False})

Not sure if optimization.fuse.active needs to be False. I remember that it solved an issue some while ago and it was recommended in a GitHub issue. Might not be necessary anymore.

Here is a blog post and here a thread in the Pangeo Discourse about it.

Topic		Replies	Views
HLS time series using xarray best practices	8	253	October 14, 2024
Problems with operations encompassing multiple times in CM 2.6 data Data	5	1321	June 18, 2020
HPC Time series processes Science	5	1023	February 10, 2020
Extracting pixel values to the points distributed over larger area Data	4	502	March 5, 2024
Comparing odc.stac.load and stackstac for raster composite workflow Data	23	2552	June 5, 2025

Compute time series for 70,000 locations (Speed up the processing)

Related topics