June 16, 2021: SmartSim: Online Analytics and Machine Learning for HPC Simulations

DOI

Pangeo Showcase talk by Sam Partee, Hewlett Packard Enterprise

Bio

Sam Partee is a Machine Learning Engineer at Hewlett Packard Enterprise (HPE). Sam’s team was a part of the Cray AI innovation lab before being acquired by HPE. Sam holds a degree in computer science with a focus on High Performance Computing and Machine Learning.

Abstract

SmartSim is an open source library dedicated to enabling online analysis and Machine Learning (ML) for traditional High Performance Computing (HPC) simulations. SmartSim provides the ability for simulations written in C, C++, Fortran, and Python to call out to PyTorch, TorchScript, TensorFlow, and any model that supports the ONNX format (i.e. scikit-learn). In addition, the in-transit architecture of SmartSim enables simulation data streaming for online analysis, processing, and training.

In this talk we detail the SmartSim architecture and provide benchmarks including online inference and throughput on multiple Cray XC50 supercomputers. We will detail examples including how we used SmartSim to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. Lastly, we will present our plans for open source community involvement, and detail current development directions and research.

1 Like