Compiler selection for cloud instances

zcobell · May 13, 2021, 3:31pm

Hi All -

I spoke with @rsignell about some of the challenges being faced with the cloud HPC implementations. One of which was properly getting the Intel compilers licenses up on the AWS systems. It got me thinking about an interesting talk given at a C++ conference and some subsequent testing I did.

CppCon 2016: Tim Haines “Improving Performance Through Compiler Switches…"

Like many, it’s always been my bias that “the Intel compilers are just the fastest,” but it seems that may not be the best assumption to make without really looking into what the compilers are doing. One thing I took away from this is that Intel enables the “fast math” or unsafe math optimizations at level -O2 (the default optimization) whereas the GNU compiler suite doesn’t ever enable it automatically. I think that was a bit of a shocker to me and made me curious about what it might mean for some of our solvers.

It seems when all of these compilers are put onto a level playing field, you can generally get simliar results, however, it will always be critical to test. My own testing on the new Bridges-2 system at the Pittsburgh Supercomputing Center found that the latest GNU and latest Intel were pretty much level in terms of performance on those AMD 7742 chips. I’d be curious how the performance compared when the Intel compiler ran on Intel chips.

The optimization flags I used in testing were:

Vendor	Version	Flags
Intel	20.4	`-O3 -mavx2`
GNU	10.2.0	`-O3 -march=znver2 -ffast-math -funroll-loops`

Note: Intel does not at this point have an AMD Zen specific tuning flag, but AVX2 operations are the large driver so we’ve targeted that optimization. You could also (I believe) use the Intel Skylake optimization set on Zen2

In a 256-core test (i.e. two node) using OpenMPI-4 and the ADCIRC model, the results were pretty close.

Vendor	Wallclock Time (min)	Seconds Per Simulation Day
Intel	201.20	670.67
GNU	203.82	679.40

Ultimately, Intel did have a slight advantage for this particular model and it’s possible a more exotic set of flags could help drive things down a bit closer but I’ve not had time to test just yet. However, if licensing is a pain or a nonstarter, the GNU compilers seem to do a pretty good job and have been our choice running on AWS at present to sidestep moving licenses around every time we fire up a ParallelCluster instance.

rsignell · May 14, 2021, 1:38pm

@zcobell, thanks so much for posting this! That presentation by Tim Haines is awesome. I’m sharing that will all my modeling colleagues.

One thing that really made an impression on me was compiling everything for the specific architecture being used instead of relying on generic libraries and executables.

I notice in your examples you specified the architecture explicitly (e.g. -march=znver2).

Did you also have good luck with -march=native ?

zcobell · May 14, 2021, 3:32pm

Yes, I’ll typically use -march=native or with Intel CPUs and the Intel compiler, -xHOST, however, in this case, I was being explicit with my flags while testing things out.

I’ve liked compiling all of my underlying libraries if I can and if it makes sense when working on a new system to make sure I understand the enabled/disabled features too. A good example for me is HDF5 where the typical versions installed on a system seem to have some threading options enabled but I like them disabled which allows me to read files while they’re still being written by running simulations without risking errors. Tuning up the underlying libraries is a nice additional benefit, and could make a difference for performance-critical libraries. PETSc comes to mind.

Topic		Replies	Views
About the Cloud HPC category Cloud HPC	0	332	May 7, 2021
Announcing the new "Cloud HPC" Category! Cloud HPC	2	633	May 31, 2021
About the HPC category HPC	0	630	September 5, 2019
How about a new "Cloud HPC" Category?	0	369	May 4, 2021
Blog post: Optimizing Cubed News & Announcements	0	202	April 3, 2024

Compiler selection for cloud instances

Related topics