Error running Re-chunking MUR SST Zarr dataset on Pangeo hub using shared filesystem

aimeeb · March 2, 2020, 2:33pm

I am looking to rechunk the MUR SST dataset which is currently published to AWS public datasets in Zarr format, using a chunk configuration of {‘time’: 5, ‘lat’: 1799, ‘lon’: 3600} to something greater in the time dimension. I am planning to test 60 x 1799 x 3600, 180 x 1023 x 2047 and 379 x 439 x 360.

If possible, I would like to do this using an existing Pangeo hub, since these resources are already part of the Pangeo efforts. However, I have run into an error using the shared file system which appears on at least 1 of N dask workers:

ValueError: array not found at path 'mask'

When I inspect the worker logs, it’s usually just 1 of N workers who see this error, which makes me think there is latency in the NFS mount consistency. And when I look at the files on each worker, they look the same. Could this be a race time condition?

I’m using https://aws-uswest2.pangeo.io/, with 4 workers configured to share the volume mount and each allocated 62GB memory (since 64 is the max per instance available, as I understand)

The notebook with the error and corresponding dask config file are here: https://gist.github.com/abarciauskas-bgse/4a6cdda3bbaa29da80aa4e10d5532b45

cc @rsignell @scottyhq

Any ideas welcome.

rabernat · March 3, 2020, 7:50pm

Responded here: https://github.com/pangeo-data/pangeo/issues/765

Topic		Replies	Views
ValueError: cannot reshape array of size 1 into shape (13968,) Data	1	769	October 16, 2023
Feedback on Zarr performance benchmarking HPC	1	1152	July 16, 2020
xr.DataArray.chunks, np.digitize and xr.DataArray.groupby, and dask Science	2	674	January 16, 2022
Best practice to store and load data-columns of equal-length from GCS (data not on a regular grid) Pangeo Cloud Support	1	470	August 28, 2023
Puzzling S3 xarray.open_zarr latency Data	10	2631	August 20, 2021

Error running Re-chunking MUR SST Zarr dataset on Pangeo hub using shared filesystem

Related topics