Cloud-Native Benchmarking: Pangeo Community Meeting June 4th Discussion Topic

When: Wednesday, June 4, 2025 at 4PM EST
Where: Launch Meeting - Zoom
Abstract:

A recurring question to and within our community is “what chunk size and shape should I store my data in?” Or, more generally, “How should large n-dimensional datasets be structured to optimize for cloud storage and network-based access?”. While a common response is “it depends (on the storage provider, dataset, libraries used, use case(s)…)”, we believe there are common best practices and examples to share.

While there have been lively discussions in response to these questions, we cannot point to a central location for sharing best practices and case studies. During this Pangeo session, we will discuss current and past benchmarking activities (contributions welcome) and discuss where this type of guidance and/or example benchmarks should live in the cloud-native community’s ecosystem. Please also checkout @maxrjonesbrainstorming document for this guidance.

Agenda:

  • 5 minutes - Welcome and request for :raised_hand: of anyone who wants to share a use case.
  • 10-20 minutes - Sharing of current and past benchmarking activities. What questions are you interested in answering? What is your methodology and what questions or challenges are you up against?
  • 20-30 minutes - Discussion of common best practices and benchmarking methods and approaches
  • 10 minutes - Discussion of where this information should live + wrap up.
10 Likes

Thanks for organizing this discussion Aimee!

FYI anyone is welcome to contribute to the brainstorming doc on a data cube best practices guide. I think editing just requires a HackMD account. The goal there is a synthesis of knowledge gained from existing benchmarking and “real-world” usages, backed by reproducible examples.

2 Likes

Great idea @aimeeb and @maxrjones! Excited to join.

Thanks for organizing this. Excited to join this!