Hi All -
I spoke with @rsignell about some of the challenges being faced with the cloud HPC implementations. One of which was properly getting the Intel compilers licenses up on the AWS systems. It got me thinking about an interesting talk given at a C++ conference and some subsequent testing I did.
CppCon 2016: Tim Haines “Improving Performance Through Compiler Switches…"
Like many, it’s always been my bias that “the Intel compilers are just the fastest,” but it seems that may not be the best assumption to make without really looking into what the compilers are doing. One thing I took away from this is that Intel enables the “fast math” or unsafe math optimizations at level -O2
(the default optimization) whereas the GNU compiler suite doesn’t ever enable it automatically. I think that was a bit of a shocker to me and made me curious about what it might mean for some of our solvers.
It seems when all of these compilers are put onto a level playing field, you can generally get simliar results, however, it will always be critical to test. My own testing on the new Bridges-2 system at the Pittsburgh Supercomputing Center found that the latest GNU and latest Intel were pretty much level in terms of performance on those AMD 7742 chips. I’d be curious how the performance compared when the Intel compiler ran on Intel chips.
The optimization flags I used in testing were:
Vendor | Version | Flags |
---|---|---|
Intel | 20.4 | -O3 -mavx2 |
GNU | 10.2.0 | -O3 -march=znver2 -ffast-math -funroll-loops |
- Note: Intel does not at this point have an AMD Zen specific tuning flag, but AVX2 operations are the large driver so we’ve targeted that optimization. You could also (I believe) use the Intel Skylake optimization set on Zen2
In a 256-core test (i.e. two node) using OpenMPI-4 and the ADCIRC model, the results were pretty close.
Vendor | Wallclock Time (min) | Seconds Per Simulation Day |
---|---|---|
Intel | 201.20 | 670.67 |
GNU | 203.82 | 679.40 |
Ultimately, Intel did have a slight advantage for this particular model and it’s possible a more exotic set of flags could help drive things down a bit closer but I’ve not had time to test just yet. However, if licensing is a pain or a nonstarter, the GNU compilers seem to do a pretty good job and have been our choice running on AWS at present to sidestep moving licenses around every time we fire up a ParallelCluster instance.