If you are like me in the past then you often pull down your in-situ ocean data to a local copy on your desktop or HPC centre. You have wget & FTP scripts running all over the place filling up directories with NetCDF files. Just for a start, version control and tracking updates and provenance can be painful. And then there is all that space and duplication.
I’m ignorant as to what in-situ ocean observation data might be available behind an S3 gateway somewhere? I am aware that lots of data is available on OPeNDAP / thredds servers - NOAA NODC for example - https://data.nodc.noaa.gov/thredds/catalog.html But my experience using xarray with OPeNDAP has been mixed, at best. I do recognise that there may be many factors that impact or degrade OPeNDAP performance, not least of which may be my connection speed to the rest of the world.
Is there already a resource somewhere that tries to document where best to remote access public in-situ ocean datasets? And best practice for doing so, maybe with examples? (I note @rabernat has this )
I may be missing some obvious resources? If not, is this something some of us could work on?
PS - if someone has already put all of the NOAA NODC data into ZARR format let us know!