Am 29. Juli 2022 19:57:51 MESZ schrieb Jason Bacon <jtocino%gmx.com@localhost>:
On 7/29/22 04:11, Dr. Thomas Orgis wrote:
I've heard arguments that conflicts between MPI packages and their
dependents are OK, but that won't fly on an HPC cluster where many
different users are running different software. Maintaining entirely
separate pkgsrc trees for each MPI implementation would cost man-hours
in a field where there's an extreme talent shortage.
I object. The classic case used to be that MPI is part of the platform, and this is still the case for fancy high-end systems with custom interconnects. You don't tend to play with differing vendors for the same application. There is a clear standard and software is supposed to work with the MPI at hand (if it's fresh enough)
A parallel install (pun unintentional) of multiple MPI would be a colossal time-sink and source of pain with separate builds of fftw and boost ... and any MPI-using stuff. You really want multiple incarnations of Boost-MPI, with a matrix of bindings for pyXY?!
With that thinking ... you could also build all higher algebra stuff with multiple BLAS implementations ... cramming all those options into a single tree causes a lot of complexity.
Choose one default BLAS to use for packages, one MPI. Build a separate tree for other choice. Otherwise you are (further down) on the path to insanity, IMHO.
The actual question here is how to manage the CUDA (and rocm, one API) stuff, coming via external toolchain, likely. In a single tree, which package now is GPU-optimized, which for Intel, which for AMD? It is far simpler for HPC to say
module switch env env/pkg2022Q1-cuda-gcc-openmpi
or
module switch env env/pkg2022Q1-intel-impi
to then know which software set you got. Incidentally, we build versioned trees, anyway, as users may want to use the same builds for a set of computations over years, but others want to use bleeding edge CUDA stuff and a more modern R (or the ancient python-2.7 stuff).