Hi Pangeo! I work on the IRI/LDEO Data Library. The group I belong to has been hosting hundreds of mostly climate-related datasets in the Data Library for 20+ years. We’re looking into replacing some of our in-house software with the tools you’ve been developing and integrating.
I’m not deep in the Pangeo weeds yet, but from the outside it looks like the Pangeo approach to data access doesn’t make a clear separation of interface from implementation. As of today, to read a Zarr dataset from the Pangeo catalog a client has to have the current version of the Zarr library. If tomorrow you were to switch from Zarr to TileDB, or even upgrade from the current version of Zarr to some future version that’s not backwards-compatible, then all the binders that have been written to work with today’s setup would stop working. I fear that this could hinder progress after a while.
In Closed Platforms vs. Open Architectures for Cloud-Native Earth System Analytics, Ryan and Joe acknowledge that
In geoscience, we have had an excellent remote-data-access protocol for a long time: the “Open-source Project for a Network Data Access Protocol” or OPeNDAP.
but their position is that the OPeNDAP protocol, while it’s nice for smaller datasets,
was simply not designed with petascale applications and massively parallel distributed processing in mind.
While it may be true that the designers of OPeNDAP didn’t explicitly have this kind of processing in mind, it’s not obvious to me why it wouldn’t be possible to use it at PB scale. I certainly agree that existing implementations of the protocol don’t scale well, but that’s a different assertion with different consequences. I’m interested in revisiting the question of whether OPeNDAP could be a viable implementation-agnostic interface to Pangeo data.
I have more to say on the subject, but I know that by reading a couple of blog posts I’m only seeing the tip of the iceberg of what’s been happening in this community. Before I write more, does anyone have more recent or more detailed information about the problems involved with scaling OPeNDAP? Or on why that wouldn’t be desirable even it were technically posible?