Peano
|
Peano has three different approaches how to handle system- and compiler-specific settings and properties:
The page below discusses these particular toolchains and provides some information on settings for some vendors. There is a dedicated subpage for some machines that we use quite a lot for Peano. Before we dive into particular toolchains, some general remark on how the compiler-dependencies are managed:
Peano relies on a header tarch/compiler/CompilerSpecificSettings.h. This header reads out the compiler version and includes a particular flavour of the header for this compiler, i.e. the header reads out some compiler preprocessor directives and then includes the one it find most appropriate. You may always include your own file derived from one of the other headers in the directory.
Whenever we find incompatibilities between different compilers, we try to resolve them through defines within the compiler-specific settings. This way, we avoid that some "fixes" are spread over the whole code. The setting also are used to configure for particular machine specifica such as default alignment. Most expressions within the compiler-specific settings header can manually be overwritten via defines. Consult the file implementation for details.
The config.h as generated by the build system also feeds into the compiler-specific settings. First of all, it defines a few generic constants such as SharedTBB or Parallel. These are classic macro symbols. Whenever certrain macro combinations have knock-on effects on other features, they should be covered within CompilerSpecificSettings and in turn be mapped onto further macros.
There are exceptions to this rule: If you have a certain GPU backend, you might have to annotate functions in a certain way. Such information is not covered within CompilerSpecificSettings (it has nothing to do with a particlar compiler choice), but is found directly within the headers of the respective namespace in the technical architecture.
Peano provides support for Intel's (oneAPI) toolchain through two routes: The ITAC interface and the Instrumentation and Tracing Technology (ITT). To activate either of these toolchains, you have to reconfigure with one of the two options below:
Broadly speaking, the toolchains alters the code in two ways:
The enumeration shows that the name Intel toolchain is misleading. We are actually not tailoring the build to the Intel toolchain, but we tailor the setup to the Intel analysis tools. Most production runs will not use the Intel toolchain, but rather set all Intel-specific flags for compiler and linker manually.
At the moment, we add the following flags to the compile when the Intel toolchain is activated:
Even though you have switched to the Intel loggers, you will not get any trace information if you build in release mode. You have to switch to the trace mode (or assertions or debug) - compare general remarks on Peano's build modi - to get traces or annotation info.
Once you try to trace your code with the Intel tool, the size of the traces quickly becomes unmanageable or the performance might go down. Therefore, we disable the trace by default. The first trace command will then enable it.
For further discussion on logging, please see the generic logging description for Peano.
We found that the newer Intel compilers provide a flag
which means you don't have to manually link against TBB anymore. It also should provide all the includes. It is not clear what flags are set (might be more than only a few include paths), so we found this route more reliable rather than adding TBB paths and libraries manually.
SYCL is directly supported via icpx. There are only a few things to do:
configure
will test if you C compiler is compatible with your linker flags. However, the -fsycl
command is unknown to the default C compiler (usually gcc) and your configure will fail miserably. By letting the C compiler variable point to the Intel C++ compiler, you ensure that the sanity check within the configuration phase passes.Intel's MPI wrapper is mpiicpc even though they now want you to use icpx instead of icc/icpc. To tell the MPI wrapper that you want to use icpx, you have to set some environment variables:
The NVIDIA toolchain requires us to make CXX
and CC
point to the C++ compiler, as both are passed the same arguments by configure, which are actually only understood by the C++ version.
Some compute kernels of some Peano extensions (such as ExaHyPE) are not available with the NVIDIA tools, as the compiler is pretty picky. It refuses, for example, to place temporary array variables on the call stack. So you might have to play around with the software configuration.
Yet to be written. There is an NVIDIA logger offered.
NVIDIA's compiler does not support all of OpenMP. Therefore, its OpenMP implementation is rather picky when it comes to copy constructors and other C++ features. has internal workarounds for all of these items, i.e. the build should in principle succeed yet the code might be slightly slower than the Intel or native Clang counterpart.
The OpenMP offloading requires us to specify both the GPU OpenMP target and a corresponding CUDA version. It is important to recognise that --with-gpu=omp
activates the offloading from 's point of view. However, as long as the NVIDIA compiler is not told which device to use, it will give you
We require -cuda
to be passed to the compiler and linker flags to enable Unified Shared Memory (USM).
devices if you run the code.
NVIDIA's NVPTX logging is supported by picking the
toolchain.
For the AMD toolchain we need to load either ROCm or AOMP. Both modules should have AMD's version of Clang. With the AMD toolchain we use AMD's modified Clang. We again need to point CXX
and CC
to the Clang compiler.
The Clang compiler does not support omp_get_mapped_ptr() at the moment. We have implemented a workaround mk_omp_get_mapped_ptr() in src/tarch/multicore/omp/MkUtils.h which does the same trick.
For the configuration with OpenMP GPU offloading, we again specify --with-gpu=omp
to make Peano GPU aware. Furthermore, we need to add AMD specific offloading instructions:
where <gpu-arch>
is either gfx906
(MI 50), gfx908
(MI 100), or gfx90a
(MI 200 series).
On Ubuntu and many other systems, clang is not shipped automatically with OpenMP. Instead, you have to install the package libomp-dev
.
Should you require OpenMP offloading, i.e. GPU, support, you have to add the OpenMP target instructions:
Our code base relies on several Python packages. There's a requirements specification file which you can use to ensure that all requirements are met:
where the requirements.txt file resides within Peano's root directory. Depending on your system, you might have to use pip3 or install the packages in your user space by adding --user
.