|
| void | barrier (std::function< void()> waitor=[]() -> void {}) |
| |
| void | allReduce (const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, std::function< void()> waitor=[]() -> void {}) |
| |
| void | reduce (const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, std::function< void()> waitor=[]() -> void {}) |
| |
| bool | isMessageInQueue (int tag) const |
| | In older DaStGen version, I tried to find out whether a particular message type is in the MPI queue.
|
| |
| void | logStatus () const |
| | Logs the status of the process onto the log device.
|
| |
| virtual | ~Rank () |
| | The standard destructor calls MPI_Finalize().
|
| |
| bool | init (int *argc, char ***argv) |
| | This operation initializes the MPI environment and the program instance.
|
| |
| void | shutdown () |
| | Shuts down the application.
|
| |
| int | getRank () const |
| | Return rank of this node.
|
| |
| MPI_Comm | getCommunicator () const |
| |
| int | getNumberOfRanks () const |
| |
| bool | isGlobalMaster () const |
| | Is this node the global master process, i.e.
|
| |
| void | triggerDeadlockTimeOut (const std::string &className, const std::string &methodName, int communicationPartnerRank, int tag, int numberOfExpectedMessages=1, const std::string &comment="") |
| | Triggers a time out and shuts down the cluster if a timeout is violated.
|
| |
| void | writeTimeOutWarning (const std::string &className, const std::string &methodName, int communicationPartnerRank, int tag, int numberOfExpectedMessages=1) |
| | Writes a warning if relevant.
|
| |
| bool | exceededTimeOutWarningThreshold () const |
| |
| bool | exceededDeadlockThreshold () const |
| |
| void | plotMessageQueues () |
| |
| void | ensureThatMessageQueuesAreEmpty (int fromRank, int tag) |
| | Ensure that there are no messages anymore from the specified rank.
|
| |
| void | setDeadlockWarningTimeStamp () |
| | Memorise global timeout.
|
| |
| void | setDeadlockTimeOutTimeStamp () |
| |
| bool | isInitialised () const |
| |
| void | setTimeOutWarning (int valueInSeconds) |
| | Set time out warning.
|
| |
| void | setDeadlockTimeOut (int valueInSeconds) |
| | Set deadlock time out.
|
| |
| void | setCommunicator (MPI_Comm communicator, bool recomputeRankAndWorld=true) |
| | Set communicator to be used by Peano.
|
| |
| void | suspendTimeouts (bool timeoutsDisabled) |
| |
| int | getProvidedThreadLevelSupport () const |
| | Information on supported thread-level support.
|
| |
Represents a program instance within a cluster.
Thus, this class is a singleton.
The parallel concept is a client - server model (process 0 is the server), where all active nodes act as servers deploying calculations on demand. So the basic activities of a parallel node are
- receive new root element to work on
- pass back space-tree
- perform additive cycle
- perform multiplicative cycle
The two perform commands have to return several statistic records. Among them are time needed, some residual characteristics and flags defining if the domain has been refined. Furthermore the number of communication partners is an interesting detail.
In the near future, this class should become responsible for the error handling. Right now, the error handle set is the fatal error handler. That is the whole parallel application is shut down as soon as an mpi error occurs.
- Author
- Tobias Weinzierl
- Version
- Revision
- 1.51
Definition at line 48 of file Rank.h.
| void tarch::mpi::Rank::allReduce |
( |
const void * | sendbuf, |
|
|
void * | recvbuf, |
|
|
int | count, |
|
|
MPI_Datatype | datatype, |
|
|
MPI_Op | op, |
|
|
std::function< void()> | waitor = []() -> void {} ) |
Wrapper around allreduce
Use this wrapper in Peano to reduce a value to all ranks. We recommend
not to use plain MPI within Peano applications, as Peano relies on MPI+X
and is strongly dynamic, i.e. you never know how many threads a rank
currently employs and if one of the ranks maybe just has started to send
around point to point load balancing messages.
## Rationale
Peano relies heavily on unexpected, asynchronous message exchange to
admin the workload. Ranks tell other ranks on the fly, for example, if
they delete trees or create new ones. They also use MPI messages to
realise global semaphores.
As a consequence, any reduction runs risk to introduce a deadlock: Rank
A enters an allreduce. Rank B sends something to A (for example a
request that it would like to load balance later) and then would enter
the allreduce. However, this send of a message to A (and perhaps an
immediate receive of a confirmation message) might not go through, as A
is busy in the reduction. Therefore, we should never use an allreduce
but instead issue a non-blocking allreduce. While the reduce is pending,
we should be able to do something else such as answering to further
request messages.
This allreduce allows us to do so. The default waitor is a receive of
pending messages. So most codes use the allReduce as follows:
<pre>
tarch::mpi::Rank::getInstance().allReduce( ..., [&]() -> void { tarch::services::ServiceRepository::getInstance().receiveDanglingMessages(); } );
Rationale behind ifdefs
I wanted to make this routine a normal one that degenerates to nop if you don't translate with MPI support. However, that doesn't work as the signature requires MPI-specific datatypes.
Multithreading
This reduction is completely agnostic of any multithreading. Therefore, I do recommend not to use this routine within any action set.
Attribute semantics
The routine adheres to the plain MPI semantics. As it passes the MPI_Op argument through to MPI, we also support all reduction operators.
In line with MPI, please ensure that the receive buffer already holds the local contribution. Notably, the routine does not copy over the content from sendbuf into recvbuf.
Referenced by swift2::ParticleSpecies::allReduce().
| void tarch::mpi::Rank::barrier |
( |
std::function< void()> | waitor = []() -> void {} | ) |
|
Global MPI barrier
I provide a custom barrier. It semantically differs from a native
MPI_Barrier as MPI barriers would not be able to do anything while
they wait. I therefore make this barrier rely on MPI's non-blocking
barrier and give the user the opportunity to tell me what do while
I wait to be allowed to pass the barrier.
The most common pattern how to use the barrier in Peano 4 is to pass
the following functor to the barrier as argument:
<pre>
[&]() -> void { tarch::services::ServiceRepository::getInstance().receiveDanglingMessages(); }
</pre>
Please note that this barrier remains an MPI barrier. It does not act
as barrier between multiple threads. In particular: if you use this
barrier in a multithreaded code, then each thread will launch a barrier
on its own. If the number of threads/tasks per rank differs, deadlocks
might arise. Anyway, it is not a good idea to use this within a
multithreaded part of your code.
@param waitor is my functor that should be called while we wait. By
default, it is empty, i.e. barrier degenerates to a blocking barrier
in the MPI 1.3 sense.
| bool tarch::mpi::Rank::init |
( |
int * | argc, |
|
|
char *** | argv ) |
This operation initializes the MPI environment and the program instance.
Not that the argv and argc parameters are both in and out parameters. Before you pass them to the operation, they are set by the mpi environment. Afterwards the original parameters from the user are stored within them.
Implementation details
init never uses the log device to report any errors as the log device usually in turn uses Node's getters. Furthermore, the _initIsCalled flag has thus to be set before the log state operation is invoked.
Upper tag bound
We try to find out how many tags the system does support. Here, the MPI standard is a little bit weird, as it requires a pointer to a void pointer and then we can read the data from there. So it seems we get a pointer to a system variable back rather than the system variable.
Thanks to Andrew Mallinson (Intel) for pointing this out.
- Returns
- true if initialisation has been successful
- See also
- shutdown
| static int tarch::mpi::Rank::reserveFreeTag |
( |
const std::string & | fullQualifiedMessageName, |
|
|
int | numberOfTags = 1 ) |
|
static |
Return a Free Tag.
Returns a free tag to be used for a new datatype. Each result is delivered exactly once. The string argument is just for logging.
details
This operation should write something to the log devices. However, it is static and the class' log devices are static, too. C++ has no mechanism to define which static entity has to be instantiated first. On some systems, it hence happened that this tag registration got called before the logging device had been up and running. The problem is known as static initialization order problem:
https://isocpp.org/wiki/faq/ctors#static-init-order
So what I do is that I log to std::cout only. This eliminated problems here.
| void tarch::mpi::Rank::shutdown |
( |
| ) |
|
Shuts down the application.
Should be the last operation called by the overall application.
!!! Rationale
Originally, I put the shutdown operation into the destructor of Node. The MPI environment consequently was shut down as soon as the operating system terminates the application. However, Scalasca complained on the BlueGene/P that the destruction happened after the return statement of the main method. To make Peano work with Scalasca, I hence moved the MPI shutdown into a method called explicitely before the final return statement.
This seems to be a CLX effect, i.e. the Intel and the GNU compilers worked fine with Scalasca. Hence, I assume that Intel and GNU executables destroy all static objects (singletons) before they return from the main functions. CLX destroys the static objects after the return statement and thus makes Scalasca's instrumentation report an error.
| void tarch::mpi::Rank::triggerDeadlockTimeOut |
( |
const std::string & | className, |
|
|
const std::string & | methodName, |
|
|
int | communicationPartnerRank, |
|
|
int | tag, |
|
|
int | numberOfExpectedMessages = 1, |
|
|
const std::string & | comment = "" ) |
Triggers a time out and shuts down the cluster if a timeout is violated.
The implementation does not use MPI_Abort, since it seems that this operation requires all nodes running. Instead of, getDeadlockWarningTimeStamp() uses the system exit function passing it DEADLOCK_EXIT_CODE as exit code.
The operation should be called only if the deadlock time-out is switched on ( isTimeOutDeadlockEnabled() ) and the deadlock time-out has expired. Use getDeadlockWarningTimeStamp() and the system operation clock() to check the second requirement.
- Parameters
-
| className | Name of the class that triggers the deadlock shutdown. |
| methodName | Name of the method that triggers the deadlock shutdown. |
| communicationPartnerRank | Rank of the node the operation that should have sent a message but did not. |
| void tarch::mpi::Rank::writeTimeOutWarning |
( |
const std::string & | className, |
|
|
const std::string & | methodName, |
|
|
int | communicationPartnerRank, |
|
|
int | tag, |
|
|
int | numberOfExpectedMessages = 1 ) |
Writes a warning if relevant.
This operation writes a warning if the code might assume that it runs into a timeout (or excessive wait). The routine assumes that timeouts are enabled and that the user has called
setDeadlockTimeoutTimeStamp()
before. To avoid an excessive list of timeout warnings, a warning is always followed by moving the next warning timestamp (so you don't get dozens of timeouts) and the code also creases the time span that it uses from hereon to report on timeouts.
| std::chrono::seconds tarch::mpi::Rank::_timeOutWarning |
|
private |
Timeout warning.
How long shall the application wait until it writes a time-out warning. Different to _deadlockTimeOut, this value changes over time: If we write a warning message, we increase this value. So, by default, you can make is rather small. The code then will increase it once you have written a warning to avoid that the terminal is flooded with these warnings. It will never by bigger than _deadlockTimeOut though due to this growth.
Despite the growth discussion, you can use a timeout warning that is greater than zero and set _deadlockTimeOut to zero. In this case, you will get warnings that the code thinks it runs into a deadlock, but you do not get a timeout ever.
- See also
- writeTimeOutWarning()
-
_deadlockTimeOut
Definition at line 102 of file Rank.h.