Parallel creation of STAC catalogs

Just seconding what @sharkinsspatial said in simpler terms: this is a database problem. The minute you are starting to think about how to update something in a consistent way from multiple processes, you are effectively building a database. There is 70 years of research and theory on how to do this right.

I think that many of the problems that our community faces in terms of data management boil down to the fact that we often use files where a database would be more appropriate.

From the perspective of maintaining a big data archive in the cloud, I’m basically ready to retract this blog post I wrote almost six year ago:

After several years of trying to put that into practice, I no longer think that object storage by itself is a silver bullet for building a cloud-native data repository. It simply doesn’t offer the sorts of transactional guarantees you need to do this right.

Sorry for going meta! :laughing:

3 Likes