Cloud-Native Benchmarking: Pangeo Community Meeting June 4th Discussion Topic

aimeeb · May 27, 2025, 8:08pm

When: Wednesday, June 4, 2025 at 4PM EST
Where: Launch Meeting - Zoom
Abstract:

A recurring question to and within our community is “what chunk size and shape should I store my data in?” Or, more generally, “How should large n-dimensional datasets be structured to optimize for cloud storage and network-based access?”. While a common response is “it depends (on the storage provider, dataset, libraries used, use case(s)…)”, we believe there are common best practices and examples to share.

While there have been lively discussions in response to these questions, we cannot point to a central location for sharing best practices and case studies. During this Pangeo session, we will discuss current and past benchmarking activities (contributions welcome) and discuss where this type of guidance and/or example benchmarks should live in the cloud-native community’s ecosystem. Please also checkout @maxrjones’ brainstorming document for this guidance.

Agenda:

5 minutes - Welcome and request for of anyone who wants to share a use case.
10-20 minutes - Sharing of current and past benchmarking activities. What questions are you interested in answering? What is your methodology and what questions or challenges are you up against?
20-30 minutes - Discussion of common best practices and benchmarking methods and approaches
10 minutes - Discussion of where this information should live + wrap up.

maxrjones · May 28, 2025, 12:46pm

Thanks for organizing this discussion Aimee!

FYI anyone is welcome to contribute to the brainstorming doc on a data cube best practices guide. I think editing just requires a HackMD account. The goal there is a synthesis of knowledge gained from existing benchmarking and “real-world” usages, backed by reproducible examples.

norlandrhagen · May 29, 2025, 3:54pm

Great idea @aimeeb and @maxrjones! Excited to join.

jbusecke · May 30, 2025, 6:48pm

Thanks for organizing this. Excited to join this!

Topic		Replies	Views
Is there a standardised approach to bench marking systems?	2	622	October 13, 2022
Suggested database for large amount of NetCDF data Data	13	3360	April 7, 2022
Could Someone Give me Advice for Handling Large Datasets with Pangeo? News & Announcements	0	109	July 29, 2024
Cleaning out the pangeo-data google cloud storage bucket Cloud	27	3075	February 5, 2020
Pangeo Showcase: "Let's solve the problem of object storage" (October 1, 2025 at 12 PM ET) Pangeo Showcase	0	268	September 29, 2025

Cloud-Native Benchmarking: Pangeo Community Meeting June 4th Discussion Topic

Related topics