Is there a standardised approach to bench marking systems?

Hi Niall! I have come to the conclusion that it’s quite hard to benchmark all the different layers of a system in one go. There are so many moving parts, and particularly once Dask gets involved it all gets very sensitive to the details of the calculation you have chosen (if you want to see a deep dive, check out Dask+xarray and swap memory polution on local linux cluster).

The best I have managed to do is benchmark storage system distributed throughput. You can read that notebook here: Big Arrays, Fast: Profiling Cloud Storage Read Throughput — Pangeo Gallery documentation

1 Like