Thank you both. I try to work in notebooks and not worry about setting up CLI’s. (I’m using a CEPH S3 bucket)
For the record, here are 3 ways I uploaded the same 14 GB Zarr file, with
base_dir = "my/full/path/"
zarr_directory = "my.zarr"
endpoint_url='http://my_url:my_port'
os.environ['AWS_ACCESS_KEY_ID'] = access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = secret_access_key
s3 = s3fs.S3FileSystem(key=access_key,
secret=secret_access_key,
endpoint_url=endpoint_url
)
fs = fsspec.filesystem('s3', endpoint_url=endpoint_url)
local_zarr_dir = base_dir + zarr_directory
fsspec way
fs.put(local_zarr_dir, "zarr-fsspec", recursive=True)
zarr-fsspec bucket gets :
Total Objects: 3091
Total Size: 13.3 GiB
Zarr way
store1 = zarr.DirectoryStore(base_dir + zarr_directory) # Source store
store2 = s3fs.S3Map(root='zarr-tt', s3=s3, check=False)
zarr.copy_store(store1, store2,if_exists='replace')
zarr-tt bucket gets :
Total Objects: 3091
Total Size: 13.3 GiB
Fuse mount way
In a fresh directory mounted to the zarr-fuse
bucket and copying the file into that mounted directory.
zarr-fuse bucket gets:
Total Objects: 3095
Total Size: 13.2 GiB
I end up with buckets without the Zarr filename, which I can of course fix before uploading contents, but it makes the copying trickier than for example copying a PMTile or a COG.
How much of an access penalty would I get if I were to use a gzipped Zarr, which would get me a “filename with the contents inside” ?
Cheers