Title: “Arraylake: A Cloud-Native Data Lake Platform for Earth System Science”
Invited Speaker: Ryan Abernathy (ORCID ID: 0000-0001-5999-4917) and Joe Hamman (ORCID ID: 0000-0001-7479-8439), Earthmover
When: Wednesday Oct 4, 12PM EDT
Where: Launch Meeting - Zoom
Abstract:
The vast amount of earth system data available today is an incredible resource for understanding our planet and confronting the challenge of climate change. Traditionally, a few large organizations have provided most of the data, and users have downloaded data to local computers. This way of working is becoming increasingly infeasible as data volumes grow and as AI-based methods demand direct access to full-scale data archives. With essentially infinite compute and storage capacity, cloud computing has the potential to revolutionize our interaction with weather and climate data, allowing everyone to bring their own compute workloads to bear against a single shared copy of the data. Over the past years, via our work in the Pangeo project, we have prototyped a cloud-native approach to weather and climate data in the cloud, combining scalable computing technologies such as Xarray and Dask with analysis-ready, cloud-optimized data in formats like Zarr. While these tools show great potential, they remain difficult to deploy and use in an operational context for many scientists and institutions.
Motivated by this challenge, we founded Earthmover, a company aimed at democratizing access to state-of-the-art cloud-native data analytics, and built Arraylake, a data platform which enables teams of any size to manage and analyze weather and climate data in the cloud. Arraylake users can access high-quality public datasets alongside their own private data, all via the high-performance Zarr data standard. This talk describes Arraylake’s architecture, novel version control system for data, and approach to supporting all common climate data formats (NetCDF, HDF5, Grib, Tiff, Zarr) via a single, user-friendly interface. Via a short demo, we illustrate how Arraylake helps overcome common data management challenges that have henceforth limited widespread adoption of cloud computing in earth system science.
- 20 minutes - Community Showcase
- 40 minutes - Showcase discussion/Community check-ins