Vision of global observation data exchange

Dear Pangeo-Community,

I am part of geo- and atmospheric science for over 10 years now and build up a deep knowledge about software dev and data science.
I am pretty sure that you have recognized the success of different platforms in many areas.
In every buisness a platform will win because it is open to many different stakeholders and everyone on the platform will benefit from each other. It is similar to Pangeo :wink:

So my idea is to build up an open-source platform to exchange observation data globally.

  1. All of the data on the platform is free and open data
  2. Everyone is able to build an upload/parsing job to provide data
  3. State-of-Art API’s are build upon that database

So what do you think about that idea?
I am really looking forward to your feedback

Best regards
Daniel Lassahn

1 Like

intake is a great tool to combine different data sources into an aligned API

Which data sizes are you thinking of?

1 Like

An open-source platform for exchanging observation data is a great idea – and one that is actively being developed by several groups! What kind of “observational” data are you thinking? Weather? Ocean? Climate? Land? Seismic? Space? Hydrological? Biogeochemical? Acoustic? I can think of different repositories for each of these and there is a lot of work already underway going into making that data findable and interoperable.

At what level are you trying to approach the problem?

  • Do you have data you want to upload somewhere?
  • Are you looking for data?
  • General software dev and data science skills and looking to contribute?
  • Have access to funding to move this forward?

There have been attempts before to try and commericialize access to this kind of data (e.g. Planet OS, previous Marinexplore, now Intertrust - https://planetos.com/ but it look like it good be defunct now) and that is challenging to make a go of it. On the open standards side, I think the OGC APIs (https://ogcapi.ogc.org/) will be a part of the solution for building interoperable services.

But perhaps you are already much further along in your vision here – if so, I’d be keen to learn more!

2 Likes

Dear jmunroe,

my personal interest is weather and more or less climate data regarding parameters relevant for the atmosphere. But I think It should not be limited on that way. I think the only limitation should be to say that this platform does not share gridded data. (because this would have totally different requirements to the underlying database techniques).

At the end, I think it is just about bringing all the stuff together that is still there. E.g. most of the data from the german weather service can be accessed by wetterdienst

I am an entrepreneur but I do not think that such data should be commercialized. I am not a friend of selling free data. That is not a good business. People, scientist, companies should use the data to build business , knowledge and research upon that.

I think the main benefit is the harmonisation of data. So if i want to have air temperature data for a specific region across borders, it is just one query away.

So what I think what is need to realized such a project:

  • a structured api to build import modules (I am thinking of a a base class covering such stuff). Maybe intake could be the basis for this.
  • everything after the import module is implemented once to automate the scheduler creation, calling the import module and create an instance of a worker (I am thinking of AWS lambda) to execute all the stuff
  • some rules for the data
  • Scheduler (at the beginning we can start fetching data daily)
  • a Database model

Everyone should be able to define an import to the database by adding an import class via merge request. The API to retrieve the data can build upon FastAPI e.g. and everyone who is registered has full access. The cost can be divided by all users weighted by the use of the api.

I think Microsoft, Amazon or google will have interest in serving infrastructure for such an open.data project.

At the end we will save a lot of disk space in the world if we have one global observation database, that can be accessed easiely and fast. And people around the world will save a lot of time

Welcome @meteoDaniel! What you describe is very similar to the vision we have for Pangeo Forge.

We have been working hard on this project for more than a year. In fact, right now we are at a SciPy sprint where we have a dozen new contributors creating new recipes.

It would be great if we could align our efforts!

1 Like

Here’s an example of how Pangeo Forge works. In the following repo, we have a recipe for the Global Precipitation Climatology Project

Pangeo Forge knows about this repo; it is linked here: Pangeo-Forge

On that page, you can copy and paste code to open the data directly:

import xarray as xr
store = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr'
ds = xr.open_dataset(store, engine='zarr', chunks={})
ds

We can instantly load and plot the entire dataset, derived from over 9000 distinct netCDF files

mask = (ds.precip >= 0) & (ds.precip < 1000)
precip_mean = ds.precip.where(mask).mean(("time")).compute()
precip_mean.plot()

image

1 Like

@meteoDaniel should correct me if I am misunderstanding, but I think the request here is primarily for observational, time-series data so “station data” rather than gridded data products.

Like the German weather service, I think most (all?) of these weather observations are already available on one system or another. On the Canadian side, I’d point to MSC GeoMet and their api for a similar solution.

Weather observations are obviously not only of interest on a country-by-country level – their value increases when combined. Are you familiar with the Global Telecommunication System (GTS) ? That’s the system that national weather centres use to share observational data wtih each other. Finding out if there are groups/organizations that provide access to that system could be useful.

Regarding APIs, again I think it is really helpful to look at the OGC APIs and specifically the EDR OGC API

I suppose my thought on this idea is that there are already many active efforts in this area. Standardization and harmonization of weather observation data is something that has lots of history to it. Rather than trying to build that infrastructure, is there an opportunity to create a pythonic wrapper to some existing service that gives the kind of access to global weather observational data?

1 Like