ESIP Cloud Computing Cluster Oct 30 - Arraylake: A Cloud-Native Data Lake Platform for Earth System Science

There are many reasons you might want to join this encore presentation of Pangeo Showcase: “Arraylake: A Cloud-Native Data Lake Platform for Earth System Science”, such as:

  1. You missed it!
  2. Like a classic movie, you could watch it over and over again.
  3. You were multi-tasking the first time around but you have a new fiscal year resolution to not multi-task anymore.
  4. You want to invite friends and watch them watch it.

Regardless of the reason, here is your chance.

Topic: Arraylake: A Cloud-Native Data Lake Platform for Earth System Science

When: Monday October 30th, 10:30-11:30 am PT / 1:30-2:30 pm ET / 7:30-8:30pm CEST

Who: Ryan Abernathey and Joe Hamman, Founders of

Where: Find joining information on ESIP Community Calendar

NOTE: THIS DAY AND TIME IS DIFFERENT FROM THE ORIGINAL ESIP COMMUNITY CALENDAR. The ESIP community calendar should be updated with this new day and time soon.

The vast amount of earth system data available today is an incredible resource for understanding our planet and confronting the challenge of climate change. Traditionally, a few large organizations have provided most of the data, and users have downloaded data to local computers. This way of working is becoming increasingly infeasible as data volumes grow and as AI-based methods demand direct access to full-scale data archives. With essentially infinite compute and storage capacity, cloud computing has the potential to revolutionize our interaction with weather and climate data, allowing everyone to bring their own compute workloads to bear against a single shared copy of the data. Over the past years, via our work in the Pangeo project, we have prototyped a cloud-native approach to weather and climate data in the cloud, combining scalable computing technologies such as Xarray and Dask with analysis-ready, cloud-optimized data in formats like Zarr. While these tools show great potential, they remain difficult to deploy and use in an operational context for many scientists and institutions.

Motivated by this challenge, we founded Earthmover, a company aimed at democratizing access to state-of-the-art cloud-native data analytics, and built Arraylake, a data platform which enables teams of any size to manage and analyze weather and climate data in the cloud. Arraylake users can access high-quality public datasets alongside their own private data, all via the high-performance Zarr data standard. This talk describes Arraylake’s architecture, novel version control system for data, and approach to supporting all common climate data formats (NetCDF, HDF5, Grib, Tiff, Zarr) via a single, user-friendly interface. Via a short demo, we illustrate how Arraylake helps overcome common data management challenges that have henceforth limited widespread adoption of cloud computing in earth system science.

  • 5 minutes - Welcome and Announcements
  • 30 minutes - Presentation
  • 25 minutes - Q + A