Peano 4
|
ExaHyPE employs Peano's domain decomposition and adds tasking on top. Peano's domain decomposition is used for both MPI and multithreading, while the tasking only affects the shared memory parallelisation. Both parallelisation paradigms are controlled via the Python API.
In ExaHyPE, you enable load balancing by adding
There are multiple different load balancing schemes available through Peano's generic load balancing toolbox.
It is important to study the constructor parameter list of exahype2::LoadBalancingConfiguration. These arguments allow you to constrain the properties of the domain. for example, you can instruct the load balancing that you want it to be well-balanced accross ranks up to 80% and you also to constrain the tree (subpartition) size by 200 cells. This means that no domain on a rank will be split if this rank's load is around 20% off, and a partition will never split up if it has less than 200 cells.
There is a always a trade-off between the different settings: If you increase the quality of the domain decomposition, then the time per time step will go down. However, the load balancing will need way longer. If you pick a minimum cell size, then the grid will build up first until it can accommodate the constraint, and then it will decompose. Grid decomposition however is both time consuming and requires (temporarily) quite a lot of memory.
While ExaHyPE makes it relatively straightforward to parallelise a code, the tuning of a particular experiment is often tricky. As ExaHyPE is an engine supporting multiple application (domains) and as the `‘perfect’' load balancing strategy depends strongly on the application type plus the chosen experiments plus the machine used, it can only provide vague heuristics of settings that work in most of the cases. To get really good performance, you will have to play around with the load balancing settings.
Here are some recommendations if you use the recursive load balancing. The recommendations describe how to tweak the three parameters that the ExaHyPE configuration offers:
Some ExaHyPE solvers add further task parallelism to this decomposition. In general, the enclave solvers tend to perform better on massively parallel systems.
Most users that write extensions to ExaHyPE which need some global data exchange are users which add new fields to their solver. Examples are global statistics that they want to track over the solution or global variables that they control.
Each MPI rank holds one instance of the solver. All threads per MPI rank share the same instance of the solver. Peano splits up the domain into chunks along the space-filling curve. If you use an enclave solver, it then splits up these chunks further into tasks. That is, operations per cell such as the Riemann solves and the patch postprocessing can run in parallel on each thread.
If you add new attributes to your solver, this attribute exists on each solver on each rank. You therefore have to
The first item is relatively simple: You might, for example, decide to protect all accesses to your own variables via a semaphore. There is no specific place to take care of the protection: Just protect the actual data access.
The second item requires slightly more work, as you have to globally reduce the attribute: After each time step, you have to ensure that all ranks exchange their attributes with each other, and you then have to merge these data. I recommend not to use MPI's data exchange routine directly, but to use Peano's wrappers around these routines.
The actual data exchange should be done at the end of a each time step. For this, you add
to your solver. The implementation should first invoke the superclass and then add the additional data exchange.
ExaHyPE internally realises these two steps for all of its solver attributes such as the maximum mesh size or the admissible time step sizes. This is the reason why you have to call the superclass implementation, too.
Most users that write extensions to which need some global data exchange are users which add new fields to their solver. Examples are global statistics that they want to track over the solution or global variables that they control.
Each MPI rank holds one instance of the solver. All threads per MPI rank share the same instance of the solver.
splits up the domain into chunks along the space-filling curve. If you use an enclave solver, it then splits up these chunks further into tasks. That is, operations per cell such as the Riemann solves and the patch postprocessing can run in parallel on each thread.
If you add new attributes to your solver, this attribute exists on each solver on each rank. You therefore have to
The first item is relatively simple: You might, for example, decide to protect all accesses to your own variables via a semaphore. These techniques are discussed in Section \[section:parallel-programming:shared-mem:protect\]{reference-type="ref" reference="section:parallel-programming:shared-mem:protect"}. There is no specific place to take care of the protection: Just protect the actual data access.
The second item requires slightly more work, as you have to globally reduce the attribute: After each time step, you have to ensure that all ranks exchange their attributes with each other, and you then have to merge these data. I recommend not to use MPI's data exchange routine directly, but to use 's wrappers around these routines. These wrappers are discussed in Section \[section:parallel-programming:shared-mem:reductions\]{reference-type="ref" reference="section:parallel-programming:shared-mem:reductions"}. The actual data exchange should be done at the end of a each time step. For this, you add
void finishTimeStep() override;
to your solver. The implementation should first invoke the superclass and then add the additional data exchange.
void mynamespace::MyFancySolver::finishTimeStep() MyFancyAbstractSolver::finishTimeStep(); #ifdef Parallel // additional data exchange #endif
internally realises these two steps for all of its solver attributes such as the maximum mesh size or the admissible time step sizes.