Loading ensembles using intake

I’m trying to use intake-esm to load ensemble zarr files into xarray. The ensemble data is in s3 type storage using minIO. I can load it using the xarray.open_zarr() method. For example,

import s3fs
import xarray as xr

fs = s3fs.S3FileSystem(client_kwargs={"endpoint_url": 'https://wifire-data.sdsc.edu:9000',"verify": False},anon=True)

mapper = fs.get_mapper('/burnpro3d/d/80/aa/run_80aa1808-adc0-42c1-88e5-abfe3cfc5c41/quicfire.zarr')

ds = xr.open_zarr(mapper, consolidated=False,  drop_variables=['surfEnergy'])
ds

However, I’m not sure which parameters to pass in to load the zarr files using intake. Here’s a snippet:

col = intake.open_esm_datastore('quicfire_ensemble_collection.json')
col_subset = col.search(run_id='80aa1808-adc0-42c1-88e5-abfe3cfc5c41')
col_subset.df.head()

dsets = col_subset.to_dataset_dict(zarr_kwargs={"consolidated": False, "drop_variables":['surfEnergy']},
                                   storage_options={"token": "anon", "client_kwargs":{"endpoint_url": 'https://wifire-data.sdsc.edu:9000',"verify": False}})
OSError: 
            Failed to open zarr store.

            *** Arguments passed to xarray.open_zarr() ***:

            - store: /burnpro3d/d/80/aa/run_80aa1808-adc0-42c1-88e5-abfe3cfc5c41/quicfire.zarr
            - kwargs: {'consolidated': False, 'drop_variables': ['surfEnergy']}

            *** fsspec options used ***:

            - root: /burnpro3d/d/80/aa/run_80aa1808-adc0-42c1-88e5-abfe3cfc5c41/quicfire.zarr
            - protocol: None

            ********************************************

Any thoughts on how to configure storage_options for a custom endpoint_url?

@ben, are you able to modify the path in the catalog to include the s3 prefix? for e.g.

/burnpro3d/d/80/aa/run_80aa1808-adc0-42c1-88e5-abfe3cfc5c41/quicfire.zarrs3://burnpro3d/d/80/aa/run_80aa1808-adc0-42c1-88e5-abfe3cfc5c41/quicfire.zarr

and rerun your code.

1 Like

also, can you try the main branch of intake-esm:

python -m pip install git+https://github.com/intake/intake-esm@main

?? there are some recent bug fixes/enhancements that address some of the fsspec related issues.

when using the main branch, this call

dsets = col_subset.to_dataset_dict(zarr_kwargs={"consolidated": False, "drop_variables":['surfEnergy']},
                                   storage_options={"token": "anon", "client_kwargs":{"endpoint_url": 'https://wifire-data.sdsc.edu:9000',"verify": False}})

becomes

dsets = col_subset.to_dataset_dict(xarray_open_kwargs={"consolidated": False, "drop_variables":['surfEnergy']},
                                   storage_options={"token": "anon", "client_kwargs":{"endpoint_url": 'https://wifire-data.sdsc.edu:9000',"verify": False}})

Prepending s3:// to the path and changing “token”: “anon” → “anon”: True in storage_options got it working! Thank you!

1 Like