Experience with no-egress-fee object storage?

What are the options if you want a no-egress fee solution to object storage?

I’ve heard of the on-prem Open Storage Network and the cloud services Wasabi and Cloudflare R2 but I’m wondering whether anyone has direct experience with these or other solutions they would be willing to share.

As a specific example, I’m wondering things like: if I was running a Pangeo deployment on AWS us-east-1, what would the performance of the Open Storage Network object storage on a 100GBps network node at the Massachusetts Green HPCC facility look like compared to S3 on us-east-1?

1 Like

We profiled this and described the results here:

http://gallery.pangeo.io/repos/earthcube2020/ec20_abernathey_etal/cloud_storage.html

Here is what we wrote

We tested three cloud storage providers outside of Google Cloud: Wasabi (a commercial service), Jetstream (an NSF-operated cloud), and OSN (an NSF-sponsored storage provider). These curves all show a similar pattern: good scaling for low worker counts, then a plateau. We interpret this a saturation of the network bandwidth between the data and the compute location. Of these three, Jetstream saturated at the lowest value (2 GB/s), then Wasabi (3.5 GB/s, but possibly not fully saturated yet), then OSN (5.5 GB/s). Also noteworthy is the fact that, for smaller core counts, OSN was actually faster than GCS within Google Cloud . These results are likely highly sensitive to network topology. However, they show unambiguously that OSN is an attractive choice for cloud-style storage and on-demand big-data processing.

If you want to try some data from the real MGHP OSN, here is an example dataset to play with:

import xarray as xr
url = "https://mghp.osn.xsede.org/cnh-bucket-1/llc4320_zarr"
ds = xr.open_dataset(url, engine='zarr').isel(time=0)
1 Like

Perfect @rabernat! Thanks so much!

Quick question: I see you used regular https: access for OSN rather than s3:. Was that for performance? There is s3: access also, right?

This is very useful info writ large, thanks for taking the time to run these tests. Will inform some of our own :slight_smile:

Glad you find this helpful! The same results are published and citable here: https://www.essoar.org/doi/abs/10.1002/essoar.10508824.2

and also here (with much more context about our vision for cloud-native data repositories)

I’m not sure if the reality matches the marketing - but Internet2 is supposed to address this for .edu users in the US

AWS and Azure are part of the project, other vendors are in progress (allegedly)

1 Like