What are the options if you want a no-egress fee solution to object storage?
I’ve heard of the on-prem Open Storage Network and the cloud services Wasabi and Cloudflare R2 but I’m wondering whether anyone has direct experience with these or other solutions they would be willing to share.
As a specific example, I’m wondering things like: if I was running a Pangeo deployment on AWS us-east-1, what would the performance of the Open Storage Network object storage on a 100GBps network node at the Massachusetts Green HPCC facility look like compared to S3 on us-east-1?
We tested three cloud storage providers outside of Google Cloud: Wasabi (a commercial service), Jetstream (an NSF-operated cloud), and OSN (an NSF-sponsored storage provider). These curves all show a similar pattern: good scaling for low worker counts, then a plateau. We interpret this a saturation of the network bandwidth between the data and the compute location. Of these three, Jetstream saturated at the lowest value (2 GB/s), then Wasabi (3.5 GB/s, but possibly not fully saturated yet), then OSN (5.5 GB/s). Also noteworthy is the fact that, for smaller core counts, OSN was actually faster than GCS within Google Cloud . These results are likely highly sensitive to network topology. However, they show unambiguously that OSN is an attractive choice for cloud-style storage and on-demand big-data processing.
If you want to try some data from the real MGHP OSN, here is an example dataset to play with: