Hello,
I am succesfully using zarr+s3 but now I want to improve my solution.
def get_xarray_from_s3(bucket_name: str, dataset_name: str) -> xarray.Dataset:
"""
Basic function to take xarray data from s3 bucket
Args:
bucket_name: name of the bucket
dataset_name: refined name of the dataset, can be a path too
Returns:
xarray.Dataset object with the data from s3
"""
check_aws_env_vars()
s3_out = s3fs.S3FileSystem(anon=False)
return xarray.open_zarr(
store=s3fs.S3Map(
root=f"s3:///{bucket_name}/{dataset_name}.zarr", s3=s3_out, check=False
)
)
This is my latest running version.
Now I expect something like that, where I am able to pass a list of datasets in the bucket:
def get_xarray_from_s3_multiple(bucket_name: str, dataset_names: List[str]) -> xarray.Dataset:
"""
Basic function to take xarray data from s3 bucket
Args:
bucket_name: name of the bucket
dataset_names: refined name of the dataset, can be a path too
Returns:
xarray.Dataset object with the data from s3
"""
check_aws_env_vars()
s3_out = s3fs.S3FileSystem(anon=False)
fileset = [s3_out.open(f"s3:///{bucket_name}/{dataset_name}.zarr") for dataset_name in dataset_names]
return xarray.open_mfdataset(fileset, engine='zarr', consolidated=True)
But this is not working due to this issue:
ValueError: Starting with Zarr 2.11.0, stores must be subclasses of BaseStore, if your store exposes the MutableMapping interface wrap it in Zarr.storage.KVStore. Got ()
I tried to wrap the s3_out.open() object by using Zarr.storage.KVStore but then I am running into TypeError.
So I hope anyone of you will know how to access multiple zarr archives at once.