Compiler selection for cloud instances

Hi All -

I spoke with @rsignell about some of the challenges being faced with the cloud HPC implementations. One of which was properly getting the Intel compilers licenses up on the AWS systems. It got me thinking about an interesting talk given at a C++ conference and some subsequent testing I did.

CppCon 2016: Tim Haines “Improving Performance Through Compiler Switches…"

Like many, it’s always been my bias that “the Intel compilers are just the fastest,” but it seems that may not be the best assumption to make without really looking into what the compilers are doing. One thing I took away from this is that Intel enables the “fast math” or unsafe math optimizations at level -O2 (the default optimization) whereas the GNU compiler suite doesn’t ever enable it automatically. I think that was a bit of a shocker to me and made me curious about what it might mean for some of our solvers.

It seems when all of these compilers are put onto a level playing field, you can generally get simliar results, however, it will always be critical to test. My own testing on the new Bridges-2 system at the Pittsburgh Supercomputing Center found that the latest GNU and latest Intel were pretty much level in terms of performance on those AMD 7742 chips. I’d be curious how the performance compared when the Intel compiler ran on Intel chips.

The optimization flags I used in testing were:

Vendor Version Flags
Intel 20.4 -O3 -mavx2
GNU 10.2.0 -O3 -march=znver2 -ffast-math -funroll-loops
  • Note: Intel does not at this point have an AMD Zen specific tuning flag, but AVX2 operations are the large driver so we’ve targeted that optimization. You could also (I believe) use the Intel Skylake optimization set on Zen2

In a 256-core test (i.e. two node) using OpenMPI-4 and the ADCIRC model, the results were pretty close.

Vendor Wallclock Time (min) Seconds Per Simulation Day
Intel 201.20 670.67
GNU 203.82 679.40

Ultimately, Intel did have a slight advantage for this particular model and it’s possible a more exotic set of flags could help drive things down a bit closer but I’ve not had time to test just yet. However, if licensing is a pain or a nonstarter, the GNU compilers seem to do a pretty good job and have been our choice running on AWS at present to sidestep moving licenses around every time we fire up a ParallelCluster instance.

1 Like

@zcobell, thanks so much for posting this! That presentation by Tim Haines is awesome. I’m sharing that will all my modeling colleagues.

One thing that really made an impression on me was compiling everything for the specific architecture being used instead of relying on generic libraries and executables.

I notice in your examples you specified the architecture explicitly (e.g. -march=znver2).

Did you also have good luck with -march=native ?

Yes, I’ll typically use -march=native or with Intel CPUs and the Intel compiler, -xHOST, however, in this case, I was being explicit with my flags while testing things out.

I’ve liked compiling all of my underlying libraries if I can and if it makes sense when working on a new system to make sure I understand the enabled/disabled features too. A good example for me is HDF5 where the typical versions installed on a system seem to have some threading options enabled but I like them disabled which allows me to read files while they’re still being written by running simulations without risking errors. Tuning up the underlying libraries is a nice additional benefit, and could make a difference for performance-critical libraries. PETSc comes to mind.