Is there a standardised approach to bench marking systems?

rabernat · October 11, 2022, 12:29pm

Hi Niall! I have come to the conclusion that it’s quite hard to benchmark all the different layers of a system in one go. There are so many moving parts, and particularly once Dask gets involved it all gets very sensitive to the details of the calculation you have chosen (if you want to see a deep dive, check out Dask+xarray and swap memory polution on local linux cluster).

The best I have managed to do is benchmark storage system distributed throughput. You can read that notebook here: Big Arrays, Fast: Profiling Cloud Storage Read Throughput — Pangeo Gallery documentation

Topic		Replies	Views
Pangeo Showcase: "Dask Array: Scaling Up for Terabyte-Level Performance" (April 9, 2025 at 12 PM ET) Pangeo Showcase	3	223	April 9, 2025
Any interest in using Ray? Cloud HPC	2	855	September 24, 2021
Cloud Optimized Geotiffs + Pangeo best practices Data	4	2065	January 21, 2021
Large-scale data processing benchmarks for Xarray-Beam	6	1565	June 13, 2022
Large Scale Geospatial Benchmarks News & Announcements	2	215	October 22, 2024

Is there a standardised approach to bench marking systems?

Related topics