First 2023 Pangeo showcase at the Feb 1 community meeting!

Welcome back to the Pangeo Showcase for Winter/Spring 2023!

1 Like

When: Wednesday February 1st 4pm EST
Where: Launch Meeting - Zoom

Title: Xarray-Datatree: Hierarchical Data Structures for Multi-Model Science

Tom Nicholas (ORCiD: 0000-0002-2176-0530 | Twitter@TEGNicholasCode)
Julius Busecke (ORCiD: 0000-0001-8571-865X)

Real scientific workflows often require working with many heterogeneous but related datasets. Examples in geoscience include: (1) scenario simulations by many different climate models in the same intercomparison project, (2) simulation data at multiple resolutions from a convergence scan or sub-grid-scale study, and (3) observational + simulation data of the same region. There is a need for a general high-level data structure which can organize such data in an accessible way, whilst still being flexible enough to adapt to the user’s mental model of their data. It should also be intuitive, so that simple operations such as calculating average climatologies are still simple to express. It should also serialize to a commonly-used data format, so as not to create backwards compatibility problems. The new xarray-datatree [1] package solves these problems, by providing a tree-like hierarchical data structure that is general enough to be useful in a wide variety of cases. Datatree extends xarray - generalizing xarray.Dataset to build upon an interface that many geoscientists are already familiar with. Analysis operations can be mapped over a whole tree, allowing simple operations to be expressed intuitively, even over complex heterogeneous datasets. Datatree is inspired by netCDF: Xarray’s highest-level object is currently an xarray.Dataset, which stores collections of arrays with a shared coordinate system and corresponds to a single group in a netCDF file. A DataTree object is instead a structured hierarchical collection of Datasets, and would map to multiple netCDF groups. Therefore serialization to and from netCDF files is possible with datatree, so backwards compatibility is maintained. We will explain the model of datatree, its relation to netCDF & Zarr, and how to use the data structure to simplify your own work. We will also give examples of using datatree with real geoscience datasets, such as CMIP6 model data. [2]
[1] GitHub - xarray-contrib/datatree: WIP implementation of a tree-like hierarchical data structure for xarray.
[2] Easy IPCC Part 1: Multi-Model Datatree | by Tom Nicholas | pangeo | Medium
Relevent material
BXplorer for JupyterLab - YouTube

  • 5-15 minutes - Community showcase
  • 5-15 minutes - Q&A / Community check-in
  • 20-35 minutes - Agenda and Open discussion