Virtualizarr with s3-compatible object storage

zdgriffith · May 12, 2025, 9:42pm

I have netCDF level 3 data in a ceph object store using the s3-compatible ceph object gateway. I would like to explore using icechunk and virtualizarr to manage zarr views of the data such as described here. I’m having trouble figuring out the right way to set up the reader in virtualizarr. By default I get an error which suggests AWS S3 is assumed:

In [5]: options = {
   ...:     "key": creds["AWS_ACCESS_KEY_ID"],
   ...:     "secret": creds["AWS_SECRET_ACCESS_KEY"],
   ...:     "client_kwargs": {"endpoint_url": creds["AWS_ENDPOINT_URL"]},
   ...: }

In [6]: vds = vz.open_virtual_dataset("s3://my-radosgw-bucket/myfile.nc", reader_options={"storage_options": options})

...
File .venv/lib/python3.12/site-packages/virtualizarr/utils.py:35, in ObstoreReader.__init__(self, store, path)
     31 import obstore as obs
     33 parsed = urlparse(path)
---> 35 self._reader = obs.open_reader(store, parsed.path)

GenericError: Generic S3 error: Error performing HEAD https://s3..amazonaws.com/my-radosgw-bucket/myfile.nc in 2.394350046s, after 10 retries, max_retries: 10, retry_timeout: 180s  - HTTP error: error sending request

I think my issue is similar to what is discussed here. However if I follow Tom’s example:

In [15]: vds = vz.open_virtual_dataset(
    ...:     f"{creds['AWS_ENDPOINT_URL']}/my-radosgw-bucket/myfile.nc",
    ...:     backend=HDFVirtualBackend,
    ...: )

I get a 403 and I’m not sure how to pass my S3 credentials. Any advice would be very much appreciated!

TomNicholas · May 22, 2025, 6:59am

Hi @zdgriffith - you’ll get better responses to these types of usage questions either by raising an issue on the VirtualiZarr github page, or asking in our community slack channel.

ceph object store using the s3-compatible ceph object gateway

This should be totally possible with obstore, but VirtualiZarr is currently trying to be too clever and auto-inferring the AWS region, which fails because your S3 is not AWS. We’re currently working on avoiding this kind of too-clever footgun by making configuration of the obstore store the users’ responsibility instead. (See Fragility of url auto-parsing logic · Issue #561 · zarr-developers/VirtualiZarr · GitHub and Refactor codebase to support a new simplified Parser->ManifestStore model. by sharkinsspatial · Pull Request #601 · zarr-developers/VirtualiZarr · GitHub).

To make it work today with your non-AWS store you may have to reach into the code and remove this auto-parsing logic manually. Happy to help with that.

zdgriffith · May 22, 2025, 2:43pm

Thanks Tom! Didn’t know you had a slack channel, I’ll join that. I’ll try removing the auto-parsing logic and reach out if I hit a snag.

Topic		Replies	Views
Zarr on other S3-compatible storage (e.g. DigitalOcean)?	3	1048	October 7, 2020
Best practice reading zarr from s3 Cloud	8	4448	July 28, 2022
How to grab data from Amazon? Science	4	631	June 10, 2021
Pangeo Showcase: "VirtualiZarr: Create virtual Zarr stores using xarray syntax" Pangeo Showcase	1	890	May 15, 2024
Upload zarr data directly to S3 Data	7	1314	January 3, 2022

Virtualizarr with s3-compatible object storage

Related topics