Is there a standardised approach to bench marking systems?

Niall_Robinson · October 10, 2022, 5:01pm

Hi Pangeans!

Is there a standardised approach to bench marking systems (i.e. storage + dask + jupyter). I’m aware of this pangeo-data/benchmarking: Benchmarking & Scaling Studies of the Pangeo Platform (github.com) but see that it’s a couple of years old and the Binder link is broken.

Thanks

Niall

rabernat · October 11, 2022, 12:29pm

Hi Niall! I have come to the conclusion that it’s quite hard to benchmark all the different layers of a system in one go. There are so many moving parts, and particularly once Dask gets involved it all gets very sensitive to the details of the calculation you have chosen (if you want to see a deep dive, check out Dask+xarray and swap memory polution on local linux cluster).

The best I have managed to do is benchmark storage system distributed throughput. You can read that notebook here: Big Arrays, Fast: Profiling Cloud Storage Read Throughput — Pangeo Gallery documentation

geynard · October 13, 2022, 3:53am

FWIW, I’m trying to revive this project a bit (you might have already seen it), as we are currently discovering European Open Science Cloud through some Pangeo related use cases. But as @rabernat says it is just a tiny part of benchmarks, and the important thing is to be able to analyze benchmark results. So this can help measure storage and network throughput, but it is just one benchmark.

In the HPC world, you’re certainly aware of all the benchmarks that exist (Linpack, HPCG, IO500, etc.). That helps for raw performances and a bit more.

For cloud storage, we’re currently deploying an object storage at CNES, but we’ve not found yet a reference bench-marking tool. We’ll probably use some Dask and Zarr at one point to be close to real use cases, but again we’ll measure only storage throughput.

Topic		Replies	Views
Cloud-Native Benchmarking: Pangeo Community Meeting June 4th Discussion Topic News & Announcements	3	248	May 30, 2025
What's Next - Software - Massive Scale	7	650	December 21, 2023
What should a Pangeo 2.0 cloud tech stack look like? News & Announcements	12	509	September 27, 2024
Pangeo Showcase: "Dask Array: Scaling Up for Terabyte-Level Performance" (April 9, 2025 at 12 PM ET) Pangeo Showcase	3	349	April 9, 2025
Hello Pangeo! News & Announcements	3	753	September 12, 2019

Is there a standardised approach to bench marking systems?

Related topics