Advice on modernizing a climate visualization web application for the Pangeo software stack

Hello Pangeo community, I developed and maintain a climate projection dashboard application (link) that I’m hoping to update in 2025 to the Pangeo toolset and modern formats. I’m primarily looking for suggestions on the mapping and data storage format.

The current web application relies on WMS via the Thredds Data Server, which has worked fine for us for better than a decade. However, my organization is trying to reduce webservers for security reasons, so I may need to shutdown my TDS server in the coming year(s). The data are 6 km resolution model projections for CONUS, but there are around 30,000 possible maps. Generating the maps on the fly with WMS worked well because most of the maps will never be requested. If I had to retire our TDS server, two possible solutions would be to pre-render tiles (which will still require a WMS tile server) or switching to something like Cloud Optimized GeoTIFF (COGs). I’ve recently switched to the Pangeo toolset (Python + xarray), so I find COGs a very interesting option, especially if it would mean we’d no longer need a backend mapping server. COGs could also work nicely since the data aren’t particularly high resolution and we’d only need maybe four or five zoom levels. COGs would also avoid the problem of needing to generate and store hundreds of thousands of tiles that will likely never be seen (or rather it is built into the COG). Does anyone have experience or advice on converting NetCDF data into COGs for a pan and zoom type data access? We should also probably make the distinction between the data (float32) and a picture of the data (RGB byte arrays). I’m assuming with COGs and the available mapping Javascript APIs I’d need to pre-render the data into RGB images, thereby locking in the color choices and data ranges? Rendering the COG data on the fly sounds great, but may not perform well on resource limited clients, like mobile devices.

The second component I’m hoping for advice on is the data storage and formats. Since it is a data visualization application, it is pretty data-heavy to drive client-side charts and graphs. When the application was converted to JavaScript (circa 2020) I adopted a bespoke JSON format with gzip compression. Our server hosts thousands of these small gzipped JSON files to avoid a server-side database backend. It currently works well, but there might be formats that fit into the Pangeo software stack better which would make updating and expanding the web application easier. Writing out custom JSON files primarily with numerical data was rather tedious. I’d prefer a binary format that is already serialized if there is such a thing is JavaScript. I’m wondering if using a JavaScript zarr implementation would work here? It would be great to write out larger zarr stores with xarray and then have them consumed client-side with a JavaScript API that really leverages chunking. A quick google shows two different JavaScript zarr API projects, but I am not sure if either is mature enough to build an entire web application around. The concept is appealing though. Rather than writing tens to hundreds of thousands small JSON files, I could export a few large zarr stores with a clever chunking scheme. Instead of one JSON file per US county, it could be a zarr for all CONUS counties but chunked by county. If the JavaScript zarr API supported the chunk seeking and loading well, it would make producing, updating, expanding the data store far easier. Does anyone have suggestions on if this is a viable option, or if there is a better way to package a lot of numerical data for JavaScript consumption without a backend service?

Happy New Year everyone and thanks for any suggestions you might have. It is fun to think on how a dashboard visualization web application could be downstream of the Pangeo software stack.

Hi @jalder,

We’ve done a bit of work in this area over the past few years and I’m happy to provide some pointers that reflect how we’re thinking about the space right now.

First, two bits of history:

  1. When I was at CarbonPlan, we built a number of dashboard-like things that used Zarr directly in the browser. This requires a fair bit of data engineering work up front but works quite well if the data can be pre-processed and rarely changes. → GitHub - carbonplan/maps: interactive multi-dimensional data-driven web maps
  2. Xpublish - we built Xpublish to solve the Xarray-to-API problem. You may be particularly interested in the Xpublish-WMS router (GitHub - xpublish-community/xpublish-wms: WMS router for xpublish) but there are a number of other options available (Zarr, OPeNDAP, OGC EDR, etc.).

Now for something more current. At AGU last month, I gave a talk titled Seamless Arrays: A Full Stack, Cloud-Native Architecture for Fast, Scalable Data Access. In it I showed how we used Icechunk (a very fast transactional storage engine for Zarr) with Xpublish to expose data in a very scalable and performant way. I think you could do something similar with Zarr/Icechunk+Xpublish WMS – and of course, we’d be happy to help!

Last point, @abkfenris and I are giving the Pangeo Showcase on January 29th where we’ll be talking about Xpublish and more.

4 Likes

From a Unidata perspective, the TDS is still being actively maintained, and happy to provide guidance via our support email:

support-thredds@unidata.ucar.edu