![]() |
Peano
|
Represents a program instance within a cluster. More...
#include <Rank.h>
Public Member Functions | |
void | barrier (std::function< void()> waitor=[]() -> void {}) |
void | allReduce (const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, std::function< void()> waitor=[]() -> void {}) |
void | reduce (const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, std::function< void()> waitor=[]() -> void {}) |
bool | isMessageInQueue (int tag) const |
In older DaStGen version, I tried to find out whether a particular message type is in the MPI queue. | |
void | logStatus () const |
Logs the status of the process onto the log device. | |
virtual | ~Rank () |
The standard destructor calls MPI_Finalize(). | |
bool | init (int *argc, char ***argv) |
This operation initializes the MPI environment and the program instance. | |
void | shutdown () |
Shuts down the application. | |
int | getRank () const |
Return rank of this node. | |
MPI_Comm | getCommunicator () const |
int | getNumberOfRanks () const |
bool | isGlobalMaster () const |
Is this node the global master process, i.e. | |
void | triggerDeadlockTimeOut (const std::string &className, const std::string &methodName, int communicationPartnerRank, int tag, int numberOfExpectedMessages=1, const std::string &comment="") |
Triggers a time out and shuts down the cluster if a timeout is violated. | |
void | writeTimeOutWarning (const std::string &className, const std::string &methodName, int communicationPartnerRank, int tag, int numberOfExpectedMessages=1) |
Writes a warning if relevant. | |
bool | exceededTimeOutWarningThreshold () const |
bool | exceededDeadlockThreshold () const |
void | plotMessageQueues () |
void | ensureThatMessageQueuesAreEmpty (int fromRank, int tag) |
Ensure that there are no messages anymore from the specified rank. | |
void | setDeadlockWarningTimeStamp () |
Memorise global timeout. | |
void | setDeadlockTimeOutTimeStamp () |
bool | isInitialised () const |
void | setTimeOutWarning (int valueInSeconds) |
Set time out warning. | |
void | setDeadlockTimeOut (int valueInSeconds) |
Set deadlock time out. | |
void | setCommunicator (MPI_Comm communicator, bool recomputeRankAndWorld=true) |
Set communicator to be used by Peano. | |
void | suspendTimeouts (bool timeoutsDisabled) |
Static Public Member Functions | |
static bool | validateMaxTagIsSupported () |
Just try to find out if a tag is actually supported. | |
static int | reserveFreeTag (const std::string &fullQualifiedMessageName, int numberOfTags=1) |
Return a Free Tag. | |
static void | releaseTag (int tag) |
static Rank & | getInstance () |
This operation returns the singleton instance. | |
static int | getGlobalMasterRank () |
Get the global master. | |
static void | abort (int errorCode) |
A proper abort in an MPI context has to use MPI_Abort. | |
Static Public Attributes | |
static const int | DEADLOCK_EXIT_CODE = -2 |
Private Member Functions | |
Rank () | |
The standard constructor assignes the attributes default values and checks whether the program is compiled using the -DParallel option. | |
Rank (const Rank &node)=delete | |
You may not copy a singleton. | |
void | receiveDanglingMessagesFromReceiveBuffers () |
Receive any Message Pending in the MPI/Receive Buffers. | |
Private Attributes | |
bool | _initIsCalled |
Is set true if init() is called. | |
int | _rank |
Rank (id) of this process. | |
int | _numberOfProcessors |
Number of processors available. | |
MPI_Comm | _communicator |
MPI Communicator this process belongs to. | |
std::chrono::seconds | _timeOutWarning |
Timeout warning. | |
std::chrono::seconds | _deadlockTimeOut |
Time to timeout. | |
std::chrono::system_clock::time_point | _globalTimeOutWarning |
std::chrono::system_clock::time_point | _globalTimeOutDeadlock |
bool | _areTimeoutsEnabled |
Global toggle to enable/disable timeouts. | |
Static Private Attributes | |
static tarch::logging::Log | _log |
Logging device. | |
static Rank | _singleton |
static int | _maxTags = -1 |
Set by init() and actually stores the number of valid tags. | |
static int | _tagCounter = 0 |
Count the tags that have already been handed out. | |
Represents a program instance within a cluster.
Thus, this class is a singleton.
The parallel concept is a client - server model (process 0 is the server), where all active nodes act as servers deploying calculations on demand. So the basic activities of a parallel node are
The two perform commands have to return several statistic records. Among them are time needed, some residual characteristics and flags defining if the domain has been refined. Furthermore the number of communication partners is an interesting detail.
In the near future, this class should become responsible for the error handling. Right now, the error handle set is the fatal error handler. That is the whole parallel application is shut down as soon as an mpi error occurs.
|
private |
|
privatedelete |
You may not copy a singleton.
|
virtual |
A proper abort in an MPI context has to use MPI_Abort.
Otherwise, only the current rank goes down. Without MPI, I have to call abort(). With an exit() call, I found that the TBB runtime for example tends to hang as it tries to tidy up. I don't want any tidying up. I want termination.
Definition at line 592 of file Rank.cpp.
Referenced by convert::input::PeanoTextPatchFileReader::addDataToPatch(), createDirectory(), tarch::logging::ChromeTraceFileLogger::error(), tarch::logging::CommandLineLogger::error(), tarch::logging::ITACLogger::error(), tarch::logging::ITTLogger::error(), tarch::logging::ScorePLogger::error(), main(), convert::input::PeanoTextPatchFileReader::parse(), convert::input::PeanoTextPatchFileReader::parsePatch(), convert::input::PeanoTextPatchFileReader::parseVariablesDeclaration(), and runTests().
void tarch::mpi::Rank::allReduce | ( | const void * | sendbuf, |
void * | recvbuf, | ||
int | count, | ||
MPI_Datatype | datatype, | ||
MPI_Op | op, | ||
std::function< void()> | waitor = []() -> void {} ) |
Wrapper around allreduce Use this wrapper in Peano to reduce a value to all ranks. We recommend not to use plain MPI within Peano applications, as Peano relies on MPI+X and is strongly dynamic, i.e. you never know how many threads a rank currently employs and if one of the ranks maybe just has started to send around point to point load balancing messages. ## Rationale Peano relies heavily on unexpected, asynchronous message exchange to admin the workload. Ranks tell other ranks on the fly, for example, if they delete trees or create new ones. They also use MPI messages to realise global semaphores. As a consequence, any reduction runs risk to introduce a deadlock: Rank A enters an allreduce. Rank B sends something to A (for example a request that it would like to load balance later) and then would enter the allreduce. However, this send of a message to A (and perhaps an immediate receive of a confirmation message) might not go through, as A is busy in the reduction. Therefore, we should never use an allreduce but instead issue a non-blocking allreduce. While the reduce is pending, we should be able to do something else such as answering to further request messages. This allreduce allows us to do so. The default waitor is a receive of pending messages. So most codes use the allReduce as follows: <pre>
tarch::mpi::Rank::getInstance().allReduce( ..., [&]() -> void { tarch::services::ServiceRepository::getInstance().receiveDanglingMessages(); } );
I wanted to make this routine a normal one that degenerates to nop if you don't translate with MPI support. However, that doesn't work as the signature requires MPI-specific datatypes.
This reduction is completely agnostic of any multithreading. Therefore, I do recommend not to use this routine within any action set.
The routine adheres to the plain MPI semantics. As it passes the MPI_Op argument through to MPI, we also support all reduction operators.
In line with MPI, please ensure that the receive buffer already holds the local contribution. Notably, the routine does not copy over the content from sendbuf into recvbuf.
Definition at line 284 of file Rank.cpp.
References getInstance(), logTraceIn, logTraceOut, setDeadlockTimeOutTimeStamp(), setDeadlockWarningTimeStamp(), triggerDeadlockTimeOut(), and writeTimeOutWarning().
Referenced by swift2::ParticleSpecies::allReduce(), applications::exahype2::euler::sphericalaccretion::MassAccumulator::finishAccumulation(), and applications::exahype2::euler::sphericalaccretion::SSInfall::finishTimeStep().
Global MPI barrier I provide a custom barrier. It semantically differs from a native MPI_Barrier as MPI barriers would not be able to do anything while they wait. I therefore make this barrier rely on MPI's non-blocking barrier and give the user the opportunity to tell me what do while I wait to be allowed to pass the barrier. The most common pattern how to use the barrier in Peano 4 is to pass the following functor to the barrier as argument: <pre>
[&]() -> void { tarch::services::ServiceRepository::getInstance().receiveDanglingMessages(); }
</pre> Please note that this barrier remains an MPI barrier. It does not act as barrier between multiple threads. In particular: if you use this barrier in a multithreaded code, then each thread will launch a barrier on its own. If the number of threads/tasks per rank differs, deadlocks might arise. Anyway, it is not a good idea to use this within a multithreaded part of your code. @param waitor is my functor that should be called while we wait. By default, it is empty, i.e. barrier degenerates to a blocking barrier in the MPI 1.3 sense.
Definition at line 352 of file Rank.cpp.
References getInstance(), logTraceIn, logTraceOut, setDeadlockTimeOutTimeStamp(), setDeadlockWarningTimeStamp(), triggerDeadlockTimeOut(), and writeTimeOutWarning().
Referenced by peano4::parallel::Node::shutdown(), and peano4::parallel::SpacetreeSet::traverse().
Ensure that there are no messages anymore from the specified rank.
Definition at line 74 of file Rank.cpp.
References assertion3.
bool tarch::mpi::Rank::exceededTimeOutWarningThreshold | ( | ) | const |
MPI_Comm tarch::mpi::Rank::getCommunicator | ( | ) | const |
Definition at line 545 of file Rank.cpp.
References assertion.
Referenced by peano4::parallel::SpacetreeSet::answerQuestions(), peano4::parallel::SpacetreeSet::cleanUpTrees(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), tarch::hasNonCriticalAssertionBeenViolated(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), exahype2::RefinementControlService::receiveDanglingMessages(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::receiveDanglingMessages(), peano4::grid::reduceGridControlEvents(), tarch::mpi::StringMessage::sendAndPollDanglingMessages(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::serveLockRequests(), and exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents().
|
static |
Get the global master.
Peano sets up a logical tree topology on all the ranks. The root of this logical tree, i.e. the rank that is responsible for all other ranks, is the global master. In contrast, every rank also has a local master that tells him what to do. This is the master and the parent within the topology tree. Use the NodePool to identify the rank of a rank's master.
Definition at line 415 of file Rank.cpp.
Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), peano4::parallel::Node::continueToRun(), main(), toolbox::particles::ParticleSet< T >::reduceParticleStateStatistics(), toolbox::particles::ParticleSet< T >::reduceReassignmentStatistics(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::releaseLock(), runBenchmarks(), mghype::matrixfree::solvers::Solver::synchroniseGlobalResidualAndSolutionUpdate(), tarch::triggerNonCriticalAssertion(), and peano4::writeCopyrightMessage().
|
static |
This operation returns the singleton instance.
Before using this instance, one has to call the init() operation on the instance returned.
Definition at line 539 of file Rank.cpp.
Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), peano4::parallel::SpacetreeSet::addSpacetree(), swift2::ParticleSpecies::allReduce(), allReduce(), peano4::parallel::SpacetreeSet::answerQuestions(), barrier(), peano4::grid::TraversalVTKPlotter::beginTraversal(), peano4::parallel::SpacetreeSet::cleanUpTrees(), peano4::parallel::Node::continueToRun(), peano4::parallel::SpacetreeSet::deleteAllStacks(), toolbox::blockstructured::GlobalDatabase::dumpCSVFile(), toolbox::particles::TrajectoryDatabase::dumpCSVFile(), tarch::logging::Log::error(), peano4::parallel::SpacetreeSet::exchangeAllHorizontalDataExchangeStacks(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), tarch::logging::LogFilter::filterOut(), applications::exahype2::euler::sphericalaccretion::MassAccumulator::finishAccumulation(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), toolbox::loadbalancing::strategies::Hardcoded::finishStep(), applications::exahype2::euler::sphericalaccretion::SSInfall::finishTimeStep(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::getAction(), peano4::maps::HierarchicalStackMap< T >::getForPush(), peano4::parallel::Node::getGlobalTreeId(), peano4::parallel::SpacetreeSet::getGridStatistics(), peano4::parallel::Node::getId(), toolbox::loadbalancing::CostMetrics::getLightestRank(), peano4::parallel::Node::getLocalTreeId(), tarch::logging::Log::getMachineInformation(), exahype2::LoadBalancingConfiguration::getMinTreeSize(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::getNumberOfLockedSemaphores(), toolbox::loadbalancing::strategies::SpreadOut::getNumberOfTreesPerRank(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::getNumberOfTreesPerRank(), peano4::parallel::Node::getRank(), peano4::parallel::SpacetreeSet::getSpacetree(), peano4::parallel::SpacetreeSet::getSpacetree(), toolbox::loadbalancing::strategies::SplitOversizedTree::getTargetTreeCost(), peano4::parallel::getTaskType(), tarch::hasNonCriticalAssertionBeenViolated(), tarch::logging::Log::info(), peano4::parallel::Node::init(), peano4::parallel::SpacetreeSet::init(), peano4::initParallelEnvironment(), tarch::multicore::initSmartMPI(), toolbox::loadbalancing::AbstractLoadBalancing::isInterRankBalancingBad(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::lockSemaphoreOnGlobalMaster(), main(), swift2::parseCommandLineArguments(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), exahype2::RefinementControlService::receiveDanglingMessages(), peano4::parallel::SpacetreeSet::receiveDanglingMessages(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::receiveDanglingMessages(), reduce(), peano4::grid::reduceGridControlEvents(), toolbox::particles::ParticleSet< T >::reduceParticleStateStatistics(), toolbox::particles::ParticleSet< T >::reduceReassignmentStatistics(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::releaseLock(), swift2::statistics::reportSearchRadiusVTDt(), swift2::dastgenTest::reportStep(), peano4::parallel::Node::reserveId(), runBenchmarks(), runParallel(), runParallel(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), tarch::mpi::StringMessage::sendAndPollDanglingMessages(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::serveLockRequests(), tarch::logging::ChromeTraceFileLogger::setOutputFile(), tarch::logging::CommandLineLogger::setOutputFile(), peano4::parallel::Node::shutdown(), peano4::shutdownParallelEnvironment(), peano4::parallel::SpacetreeSet::split(), step(), peano4::parallel::SpacetreeSet::streamDataFromSplittingTreeToNewTree(), mghype::matrixfree::solvers::Solver::synchroniseGlobalResidualAndSolutionUpdate(), peano4::parallel::tests::PingPongTest::testBuiltInType(), peano4::parallel::tests::PingPongTest::testDaStGenArray(), peano4::parallel::tests::PingPongTest::testDaStGenArrayTreeManagementMessage(), peano4::parallel::tests::PingPongTest::testDaStGenTypeIntegerMessage(), peano4::parallel::tests::PingPongTest::testDaStGenTypeStartTraversalMessage(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithBlockingReceives(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithBlockingSends(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithBlockingSendsAndReceives(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithNonblockingReceives(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithNonblockingSends(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithNonblockingSendsAndReceives(), tarch::mpi::tests::StringTest::testSendReceive(), peano4::parallel::tests::NodeTest::testTagCalculation(), peano4::parallel::SpacetreeSet::traverse(), tarch::triggerNonCriticalAssertion(), exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents(), toolbox::loadbalancing::strategies::SpreadOut::triggerSplit(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::triggerSplit(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::tryLockSemaphoreOnGlobalMaster(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::unlockSemaphoreOnGlobalMaster(), updateDomainDecomposition(), toolbox::loadbalancing::CostMetrics::updateGlobalView(), toolbox::loadbalancing::Statistics::updateGlobalView(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateLoadBalancing(), toolbox::loadbalancing::strategies::SplitOversizedTree::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOut::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::updateLoadBalancing(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateState(), toolbox::loadbalancing::strategies::SplitOversizedTree::updateState(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateState(), tarch::logging::Log::warning(), peano4::writeCopyrightMessage(), tarch::logging::Statistics::writeToCSV(), and peano4::parallel::Node::~Node().
int tarch::mpi::Rank::getNumberOfRanks | ( | ) | const |
Definition at line 552 of file Rank.cpp.
References assertion.
Referenced by peano4::parallel::Node::continueToRun(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::getAction(), peano4::parallel::Node::getGlobalTreeId(), peano4::parallel::Node::getId(), toolbox::loadbalancing::CostMetrics::getLightestRank(), peano4::parallel::Node::getLocalTreeId(), exahype2::LoadBalancingConfiguration::getMinTreeSize(), toolbox::loadbalancing::strategies::SpreadOut::getNumberOfTreesPerRank(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::getNumberOfTreesPerRank(), peano4::parallel::Node::getRank(), toolbox::loadbalancing::strategies::SplitOversizedTree::getTargetTreeCost(), toolbox::loadbalancing::AbstractLoadBalancing::isInterRankBalancingBad(), main(), peano4::grid::reduceGridControlEvents(), runBenchmarks(), runParallel(), exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents(), updateDomainDecomposition(), toolbox::loadbalancing::strategies::SpreadOut::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::updateLoadBalancing(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateState(), toolbox::loadbalancing::strategies::SplitOversizedTree::updateState(), and toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateState().
int tarch::mpi::Rank::getRank | ( | ) | const |
Return rank of this node.
In the serial version, i.e. without MPI, this operation always returns 0.
Definition at line 529 of file Rank.cpp.
Referenced by toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), peano4::grid::TraversalVTKPlotter::beginTraversal(), peano4::parallel::SpacetreeSet::cleanUpTrees(), toolbox::blockstructured::GlobalDatabase::dumpCSVFile(), toolbox::particles::TrajectoryDatabase::dumpCSVFile(), peano4::parallel::SpacetreeSet::exchangeAllHorizontalDataExchangeStacks(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), tarch::logging::LogFilter::filterOut(), toolbox::loadbalancing::strategies::Hardcoded::finishStep(), peano4::maps::HierarchicalStackMap< T >::getForPush(), peano4::parallel::Node::getGlobalTreeId(), tarch::logging::Log::getMachineInformation(), peano4::grid::reduceGridControlEvents(), runBenchmarks(), tarch::logging::ChromeTraceFileLogger::setOutputFile(), tarch::logging::CommandLineLogger::setOutputFile(), tarch::triggerNonCriticalAssertion(), exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents(), toolbox::loadbalancing::CostMetrics::updateGlobalView(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateLoadBalancing(), and toolbox::loadbalancing::strategies::SplitOversizedTree::updateLoadBalancing().
This operation initializes the MPI environment and the program instance.
Not that the argv and argc parameters are both in and out parameters. Before you pass them to the operation, they are set by the mpi environment. Afterwards the original parameters from the user are stored within them.
init never uses the log device to report any errors as the log device usually in turn uses Node's getters. Furthermore, the _initIsCalled flag has thus to be set before the log state operation is invoked.
We try to find out how many tags the system does support. Here, the MPI standard is a little bit weird, as it requires a pointer to a void pointer and then we can read the data from there. So it seems we get a pointer to a system variable back rather than the system variable.
Thanks to Andrew Mallinson (Intel) for pointing this out.
Definition at line 472 of file Rank.cpp.
References tarch::mpi::DoubleMessage::initDatatype(), tarch::mpi::IntegerMessage::initDatatype(), and tarch::mpi::MPIReturnValueToString().
bool tarch::mpi::Rank::isGlobalMaster | ( | ) | const |
Is this node the global master process, i.e.
does its rank equal get MasterProcessRank()? This operation returns always true if the code is not compiled with -DParallel.
Definition at line 420 of file Rank.cpp.
References assertion.
Referenced by applications::exahype2::euler::sphericalaccretion::MassAccumulator::finishAccumulation(), and main().
In older DaStGen version, I tried to find out whether a particular message type is in the MPI queue.
That is, I looked whether a message on this tag does exist, and then I looked whether the memory footprint matches via count. I think this is invalid. MPI really looks only into the number of bytes, so you have to know which type drops in once there is a message on a tag.
void tarch::mpi::Rank::logStatus | ( | ) | const |
void tarch::mpi::Rank::plotMessageQueues | ( | ) |
|
private |
Receive any Message Pending in the MPI/Receive Buffers.
We do poll MPI only every k iterations. Once we have found a message, we wait again k iterations. If the threshold is exceeded and no message is found we however do not reset k (I've tried this and it gives worse runtime): receiveDanglingMessages is typically called when we enter a critical phase of the simulation and we urgently wait for incoming MPI messages. So once we are in that critical regime and have already exceeded the k it would be harakiri to reset this one to 0 again - we only do so if data drops in and we thus may assume that we'll leave the critical phase anyway.
void tarch::mpi::Rank::reduce | ( | const void * | sendbuf, |
void * | recvbuf, | ||
int | count, | ||
MPI_Datatype | datatype, | ||
MPI_Op | op, | ||
int | root, | ||
std::function< void()> | waitor = []() -> void {} ) |
Definition at line 318 of file Rank.cpp.
References getInstance(), logTraceIn, logTraceOut, setDeadlockTimeOutTimeStamp(), setDeadlockWarningTimeStamp(), triggerDeadlockTimeOut(), and writeTimeOutWarning().
Referenced by toolbox::particles::ParticleSet< T >::reduceParticleStateStatistics(), and toolbox::particles::ParticleSet< T >::reduceReassignmentStatistics().
Definition at line 32 of file Rank.cpp.
References _tagCounter.
Referenced by tarch::shutdownNonCriticalAssertionEnvironment().
|
static |
Return a Free Tag.
Returns a free tag to be used for a new datatype. Each result is delivered exactly once. The string argument is just for logging.
This operation should write something to the log devices. However, it is static and the class' log devices are static, too. C++ has no mechanism to define which static entity has to be instantiated first. On some systems, it hence happened that this tag registration got called before the logging device had been up and running. The problem is known as static initialization order problem:
https://isocpp.org/wiki/faq/ctors#static-init-order
So what I do is that I log to std::cout only. This eliminated problems here.
Definition at line 39 of file Rank.cpp.
References assertion2.
Referenced by peano4::parallel::Node::init(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::init(), peano4::parallel::SpacetreeSet::init(), and tarch::initNonCriticalAssertionEnvironment().
Set communicator to be used by Peano.
Definition at line 576 of file Rank.cpp.
References logError, and tarch::mpi::MPIReturnValueToString().
Referenced by tarch::multicore::initSmartMPI().
Set deadlock time out.
Set after how much time a node waiting for an MPI message shall quit and shutdown the whole application with an error report. If you pass 0, that switches off this feature.
Definition at line 566 of file Rank.cpp.
References assertion, and logInfo.
Referenced by peano4::initParallelEnvironment(), and swift2::parseCommandLineArguments().
void tarch::mpi::Rank::setDeadlockTimeOutTimeStamp | ( | ) |
Definition at line 198 of file Rank.cpp.
Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::lockSemaphoreOnGlobalMaster(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().
void tarch::mpi::Rank::setDeadlockWarningTimeStamp | ( | ) |
Memorise global timeout.
Definition at line 193 of file Rank.cpp.
Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::lockSemaphoreOnGlobalMaster(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().
Set time out warning.
Set after how much time a node waiting for an MPI message shall write a warning that it is likely that it ran into a deadlock. If you pass 0, that switches off this feature.
Definition at line 560 of file Rank.cpp.
References assertion.
Referenced by peano4::initParallelEnvironment().
void tarch::mpi::Rank::shutdown | ( | ) |
Shuts down the application.
Should be the last operation called by the overall application.
!!! Rationale
Originally, I put the shutdown operation into the destructor of Node. The MPI environment consequently was shut down as soon as the operating system terminates the application. However, Scalasca complained on the BlueGene/P that the destruction happened after the return statement of the main method. To make Peano work with Scalasca, I hence moved the MPI shutdown into a method called explicitely before the final return statement.
This seems to be a CLX effect, i.e. the Intel and the GNU compilers worked fine with Scalasca. Hence, I assume that Intel and GNU executables destroy all static objects (singletons) before they return from the main functions. CLX destroys the static objects after the return statement and thus makes Scalasca's instrumentation report an error.
Definition at line 395 of file Rank.cpp.
References assertion, logError, logTraceIn, logTraceOut, tarch::mpi::MPIReturnValueToString(), and tarch::mpi::IntegerMessage::shutdownDatatype().
Referenced by peano4::shutdownParallelEnvironment().
void tarch::mpi::Rank::triggerDeadlockTimeOut | ( | const std::string & | className, |
const std::string & | methodName, | ||
int | communicationPartnerRank, | ||
int | tag, | ||
int | numberOfExpectedMessages = 1, | ||
const std::string & | comment = "" ) |
Triggers a time out and shuts down the cluster if a timeout is violated.
The implementation does not use MPI_Abort, since it seems that this operation requires all nodes running. Instead of, getDeadlockWarningTimeStamp() uses the system exit function passing it DEADLOCK_EXIT_CODE as exit code.
The operation should be called only if the deadlock time-out is switched on ( isTimeOutDeadlockEnabled() ) and the deadlock time-out has expired. Use getDeadlockWarningTimeStamp() and the system operation clock() to check the second requirement.
className | Name of the class that triggers the deadlock shutdown. |
methodName | Name of the method that triggers the deadlock shutdown. |
communicationPartnerRank | Rank of the node the operation that should have sent a message but did not. |
Definition at line 124 of file Rank.cpp.
References logError.
Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().
|
static |
Just try to find out if a tag is actually supported.
This works after we've called the init() routine. if we call this routine before, it will always pass. If the tag is not supported, we write a warning and return false.
Definition at line 454 of file Rank.cpp.
References logWarning.
void tarch::mpi::Rank::writeTimeOutWarning | ( | const std::string & | className, |
const std::string & | methodName, | ||
int | communicationPartnerRank, | ||
int | tag, | ||
int | numberOfExpectedMessages = 1 ) |
Writes a warning if relevant.
This operation writes a warning if the code might assume that it runs into a timeout (or excessive wait). The routine assumes that timeouts are enabled and that the user has called
setDeadlockTimeoutTimeStamp()
before. To avoid an excessive list of timeout warnings, a warning is always followed by moving the next warning timestamp (so you don't get dozens of timeouts) and the code also creases the time span that it uses from hereon to report on timeouts.
Definition at line 148 of file Rank.cpp.
References logWarning.
Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
staticprivate |
Logging device.
For the machine name.
If it doesn't work, switch it off in the file CompilerSpecificSettings.h.
|
staticprivate |
|
private |
|
private |
|
staticprivate |
|
staticprivate |
Count the tags that have already been handed out.
Definition at line 140 of file Rank.h.
Referenced by releaseTag().
|
private |
Timeout warning.
How long shall the application wait until it writes a time-out warning. Different to _deadlockTimeOut, this value changes over time: If we write a warning message, we increase this value. So, by default, you can make is rather small. The code then will increase it once you have written a warning to avoid that the terminal is flooded with these warnings. It will never by bigger than _deadlockTimeOut though due to this growth.
Despite the growth discussion, you can use a timeout warning that is greater than zero and set _deadlockTimeOut to zero. In this case, you will get warnings that the code thinks it runs into a deadlock, but you do not get a timeout ever.