What's Next - Software - In-Memory Performance

mrocklin · December 7, 2023, 2:38pm

This is part of a follow-on conversation from the “What’s next for Pangeo” discussion that took place 2023-12-06. It is part of the overarching software topic.

To help achieve good performance in Pangeo, we should consider accelerating the slow parts of our stack. This might be done in several ways:

Rust (this is what was mentioned during the call) or other low level languages (C/C++/Numba)
Smarter algorithms
General old-fashioned tuning

I’ll propose that before we can dedicate effort here, we probably want to do some profiling on common workloads. Do we understand where bottlenecks are?

mrocklin · December 7, 2023, 2:39pm

My experience looking at holistic performance on cloud for large scale dataframe computations is that S3 access is the primary bottleneck to consider. Other parts of the stack could be 10x slower than machine performance and we wouldn’t really notice.

I think that profiling and benchmarking would be useful here. I would encourage this community to assemble a set of representative small-scale benchmarks that can help inform future development work.

Topic		Replies	Views
Why learn rust? A pangeo perspective Data	19	700	March 26, 2025
What's Next - Software - Massive Scale	7	650	December 21, 2023
Is there a standardised approach to bench marking systems?	2	576	October 13, 2022
What should a Pangeo 2.0 cloud tech stack look like? News & Announcements	12	509	September 27, 2024
Measuring and reducing memory usage in data-intensive Python applications Science	0	548	February 19, 2020

What's Next - Software - In-Memory Performance

Related topics