Peano
Loading...
Searching...
No Matches
tarch::mpi::Rank Class Reference

Represents a program instance within a cluster. More...

#include <Rank.h>

Collaboration diagram for tarch::mpi::Rank:

Public Member Functions

void barrier (std::function< void()> waitor=[]() -> void {})
 
void allReduce (const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, std::function< void()> waitor=[]() -> void {})
 
void reduce (const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, std::function< void()> waitor=[]() -> void {})
 
bool isMessageInQueue (int tag) const
 In older DaStGen version, I tried to find out whether a particular message type is in the MPI queue.
 
void logStatus () const
 Logs the status of the process onto the log device.
 
virtual ~Rank ()
 The standard destructor calls MPI_Finalize().
 
bool init (int *argc, char ***argv)
 This operation initializes the MPI environment and the program instance.
 
void shutdown ()
 Shuts down the application.
 
int getRank () const
 Return rank of this node.
 
MPI_Comm getCommunicator () const
 
int getNumberOfRanks () const
 
bool isGlobalMaster () const
 Is this node the global master process, i.e.
 
void triggerDeadlockTimeOut (const std::string &className, const std::string &methodName, int communicationPartnerRank, int tag, int numberOfExpectedMessages=1, const std::string &comment="")
 Triggers a time out and shuts down the cluster if a timeout is violated.
 
void writeTimeOutWarning (const std::string &className, const std::string &methodName, int communicationPartnerRank, int tag, int numberOfExpectedMessages=1)
 Writes a warning if relevant.
 
bool exceededTimeOutWarningThreshold () const
 
bool exceededDeadlockThreshold () const
 
void plotMessageQueues ()
 
void ensureThatMessageQueuesAreEmpty (int fromRank, int tag)
 Ensure that there are no messages anymore from the specified rank.
 
void setDeadlockWarningTimeStamp ()
 Memorise global timeout.
 
void setDeadlockTimeOutTimeStamp ()
 
bool isInitialised () const
 
void setTimeOutWarning (int valueInSeconds)
 Set time out warning.
 
void setDeadlockTimeOut (int valueInSeconds)
 Set deadlock time out.
 
void setCommunicator (MPI_Comm communicator, bool recomputeRankAndWorld=true)
 Set communicator to be used by Peano.
 
void suspendTimeouts (bool timeoutsDisabled)
 

Static Public Member Functions

static bool validateMaxTagIsSupported ()
 Just try to find out if a tag is actually supported.
 
static int reserveFreeTag (const std::string &fullQualifiedMessageName, int numberOfTags=1)
 Return a Free Tag.
 
static void releaseTag (int tag)
 
static RankgetInstance ()
 This operation returns the singleton instance.
 
static int getGlobalMasterRank ()
 Get the global master.
 
static void abort (int errorCode)
 A proper abort in an MPI context has to use MPI_Abort.
 

Static Public Attributes

static const int DEADLOCK_EXIT_CODE = -2
 

Private Member Functions

 Rank ()
 The standard constructor assignes the attributes default values and checks whether the program is compiled using the -DParallel option.
 
 Rank (const Rank &node)=delete
 You may not copy a singleton.
 
void receiveDanglingMessagesFromReceiveBuffers ()
 Receive any Message Pending in the MPI/Receive Buffers.
 

Private Attributes

bool _initIsCalled
 Is set true if init() is called.
 
int _rank
 Rank (id) of this process.
 
int _numberOfProcessors
 Number of processors available.
 
MPI_Comm _communicator
 MPI Communicator this process belongs to.
 
std::chrono::seconds _timeOutWarning
 Timeout warning.
 
std::chrono::seconds _deadlockTimeOut
 Time to timeout.
 
std::chrono::system_clock::time_point _globalTimeOutWarning
 
std::chrono::system_clock::time_point _globalTimeOutDeadlock
 
bool _areTimeoutsEnabled
 Global toggle to enable/disable timeouts.
 

Static Private Attributes

static tarch::logging::Log _log
 Logging device.
 
static Rank _singleton
 
static int _maxTags = -1
 Set by init() and actually stores the number of valid tags.
 
static int _tagCounter = 0
 Count the tags that have already been handed out.
 

Detailed Description

Represents a program instance within a cluster.

Thus, this class is a singleton.

The parallel concept is a client - server model (process 0 is the server), where all active nodes act as servers deploying calculations on demand. So the basic activities of a parallel node are

  • receive new root element to work on
  • pass back space-tree
  • perform additive cycle
  • perform multiplicative cycle

The two perform commands have to return several statistic records. Among them are time needed, some residual characteristics and flags defining if the domain has been refined. Furthermore the number of communication partners is an interesting detail.

In the near future, this class should become responsible for the error handling. Right now, the error handle set is the fatal error handler. That is the whole parallel application is shut down as soon as an mpi error occurs.

Author
Tobias Weinzierl
Version
Revision
1.51

Definition at line 58 of file Rank.h.

Constructor & Destructor Documentation

◆ Rank() [1/2]

tarch::mpi::Rank::Rank ( )
private

The standard constructor assignes the attributes default values and checks whether the program is compiled using the -DParallel option.

If this is not the case, a warning is logged.

Definition at line 263 of file Rank.cpp.

◆ Rank() [2/2]

tarch::mpi::Rank::Rank ( const Rank & node)
privatedelete

You may not copy a singleton.

◆ ~Rank()

tarch::mpi::Rank::~Rank ( )
virtual

The standard destructor calls MPI_Finalize().

Definition at line 281 of file Rank.cpp.

Member Function Documentation

◆ abort()

void tarch::mpi::Rank::abort ( int errorCode)
static

A proper abort in an MPI context has to use MPI_Abort.

Otherwise, only the current rank goes down. Without MPI, I have to call abort(). With an exit() call, I found that the TBB runtime for example tends to hang as it tries to tidy up. I don't want any tidying up. I want termination.

Definition at line 592 of file Rank.cpp.

Referenced by convert::input::PeanoTextPatchFileReader::addDataToPatch(), createDirectory(), tarch::logging::ChromeTraceFileLogger::error(), tarch::logging::CommandLineLogger::error(), tarch::logging::ITACLogger::error(), tarch::logging::ITTLogger::error(), tarch::logging::ScorePLogger::error(), main(), convert::input::PeanoTextPatchFileReader::parse(), convert::input::PeanoTextPatchFileReader::parsePatch(), convert::input::PeanoTextPatchFileReader::parseVariablesDeclaration(), and runTests().

Here is the caller graph for this function:

◆ allReduce()

void tarch::mpi::Rank::allReduce ( const void * sendbuf,
void * recvbuf,
int count,
MPI_Datatype datatype,
MPI_Op op,
std::function< void()> waitor = []() -> void {} )
 Wrapper around allreduce

 Use this wrapper in Peano to reduce a value to all ranks. We recommend
 not to use plain MPI within Peano applications, as Peano relies on MPI+X
 and is strongly dynamic, i.e. you never know how many threads a rank
 currently employs and if one of the ranks maybe just has started to send
 around point to point load balancing messages.


 ## Rationale

 Peano relies heavily on unexpected, asynchronous message exchange to
 admin the workload. Ranks tell other ranks on the fly, for example, if
 they delete trees or create new ones. They also use MPI messages to
 realise global semaphores.

 As a consequence, any reduction runs risk to introduce a deadlock: Rank
 A enters an allreduce. Rank B sends something to A (for example a
 request that it would like to load balance later) and then would enter
 the allreduce. However, this send of a message to A (and perhaps an
 immediate receive of a confirmation message) might not go through, as A
 is busy in the reduction. Therefore, we should never use an allreduce
 but instead issue a non-blocking allreduce. While the reduce is pending,
 we should be able to do something else such as answering to further
 request messages.

 This allreduce allows us to do so. The default waitor is a receive of
 pending messages. So most codes use the allReduce as follows:

 <pre>

tarch::mpi::Rank::getInstance().allReduce( ..., [&]() -> void { tarch::services::ServiceRepository::getInstance().receiveDanglingMessages(); } );

Rationale behind ifdefs

I wanted to make this routine a normal one that degenerates to nop if you don't translate with MPI support. However, that doesn't work as the signature requires MPI-specific datatypes.

Multithreading

This reduction is completely agnostic of any multithreading. Therefore, I do recommend not to use this routine within any action set.

Attribute semantics

The routine adheres to the plain MPI semantics. As it passes the MPI_Op argument through to MPI, we also support all reduction operators.

In line with MPI, please ensure that the receive buffer already holds the local contribution. Notably, the routine does not copy over the content from sendbuf into recvbuf.

Definition at line 284 of file Rank.cpp.

References getInstance(), logTraceIn, logTraceOut, setDeadlockTimeOutTimeStamp(), setDeadlockWarningTimeStamp(), triggerDeadlockTimeOut(), and writeTimeOutWarning().

Referenced by swift2::ParticleSpecies::allReduce(), applications::exahype2::euler::sphericalaccretion::MassAccumulator::finishAccumulation(), and applications::exahype2::euler::sphericalaccretion::SSInfall::finishTimeStep().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ barrier()

void tarch::mpi::Rank::barrier ( std::function< void()> waitor = []() -> void {})
   Global MPI barrier

   I provide a custom barrier. It semantically differs from a native
   MPI_Barrier as MPI barriers would not be able to do anything while
   they wait. I therefore make this barrier rely on MPI's non-blocking
   barrier and give the user the opportunity to tell me what do while
   I wait to be allowed to pass the barrier.

   The most common pattern how to use the barrier in Peano 4 is to pass
   the following functor to the barrier as argument:

   <pre>

[&]() -> void { tarch::services::ServiceRepository::getInstance().receiveDanglingMessages(); }

  </pre>

   Please note that this barrier remains an MPI barrier. It does not act
   as barrier between multiple threads. In particular: if you use this
   barrier in a multithreaded code, then each thread will launch a barrier
   on its own. If the number of threads/tasks per rank differs, deadlocks
   might arise. Anyway, it is not a good idea to use this within a
   multithreaded part of your code.


   @param waitor is my functor that should be called while we wait. By
     default, it is empty, i.e. barrier degenerates to a blocking barrier
     in the MPI 1.3 sense.

Definition at line 352 of file Rank.cpp.

References getInstance(), logTraceIn, logTraceOut, setDeadlockTimeOutTimeStamp(), setDeadlockWarningTimeStamp(), triggerDeadlockTimeOut(), and writeTimeOutWarning().

Referenced by peano4::parallel::Node::shutdown(), and peano4::parallel::SpacetreeSet::traverse().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ensureThatMessageQueuesAreEmpty()

void tarch::mpi::Rank::ensureThatMessageQueuesAreEmpty ( int fromRank,
int tag )

Ensure that there are no messages anymore from the specified rank.

Definition at line 74 of file Rank.cpp.

References assertion3.

◆ exceededDeadlockThreshold()

bool tarch::mpi::Rank::exceededDeadlockThreshold ( ) const

Definition at line 115 of file Rank.cpp.

◆ exceededTimeOutWarningThreshold()

bool tarch::mpi::Rank::exceededTimeOutWarningThreshold ( ) const

Definition at line 106 of file Rank.cpp.

◆ getCommunicator()

◆ getGlobalMasterRank()

int tarch::mpi::Rank::getGlobalMasterRank ( )
static

Get the global master.

Peano sets up a logical tree topology on all the ranks. The root of this logical tree, i.e. the rank that is responsible for all other ranks, is the global master. In contrast, every rank also has a local master that tells him what to do. This is the master and the parent within the topology tree. Use the NodePool to identify the rank of a rank's master.

Returns
0

Definition at line 415 of file Rank.cpp.

Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), peano4::parallel::Node::continueToRun(), main(), toolbox::particles::ParticleSet< T >::reduceParticleStateStatistics(), toolbox::particles::ParticleSet< T >::reduceReassignmentStatistics(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::releaseLock(), runBenchmarks(), mghype::matrixfree::solvers::Solver::synchroniseGlobalResidualAndSolutionUpdate(), tarch::triggerNonCriticalAssertion(), and peano4::writeCopyrightMessage().

Here is the caller graph for this function:

◆ getInstance()

tarch::mpi::Rank & tarch::mpi::Rank::getInstance ( )
static

This operation returns the singleton instance.

Before using this instance, one has to call the init() operation on the instance returned.

Returns
The singleton instance

Definition at line 539 of file Rank.cpp.

Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), peano4::parallel::SpacetreeSet::addSpacetree(), swift2::ParticleSpecies::allReduce(), allReduce(), peano4::parallel::SpacetreeSet::answerQuestions(), barrier(), peano4::grid::TraversalVTKPlotter::beginTraversal(), peano4::parallel::SpacetreeSet::cleanUpTrees(), peano4::parallel::Node::continueToRun(), peano4::parallel::SpacetreeSet::deleteAllStacks(), toolbox::blockstructured::GlobalDatabase::dumpCSVFile(), toolbox::particles::TrajectoryDatabase::dumpCSVFile(), tarch::logging::Log::error(), peano4::parallel::SpacetreeSet::exchangeAllHorizontalDataExchangeStacks(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), tarch::logging::LogFilter::filterOut(), applications::exahype2::euler::sphericalaccretion::MassAccumulator::finishAccumulation(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), toolbox::loadbalancing::strategies::Hardcoded::finishStep(), applications::exahype2::euler::sphericalaccretion::SSInfall::finishTimeStep(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::getAction(), peano4::maps::HierarchicalStackMap< T >::getForPush(), peano4::parallel::Node::getGlobalTreeId(), peano4::parallel::SpacetreeSet::getGridStatistics(), peano4::parallel::Node::getId(), toolbox::loadbalancing::CostMetrics::getLightestRank(), peano4::parallel::Node::getLocalTreeId(), tarch::logging::Log::getMachineInformation(), exahype2::LoadBalancingConfiguration::getMinTreeSize(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::getNumberOfLockedSemaphores(), toolbox::loadbalancing::strategies::SpreadOut::getNumberOfTreesPerRank(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::getNumberOfTreesPerRank(), peano4::parallel::Node::getRank(), peano4::parallel::SpacetreeSet::getSpacetree(), peano4::parallel::SpacetreeSet::getSpacetree(), toolbox::loadbalancing::strategies::SplitOversizedTree::getTargetTreeCost(), peano4::parallel::getTaskType(), tarch::hasNonCriticalAssertionBeenViolated(), tarch::logging::Log::info(), peano4::parallel::Node::init(), peano4::parallel::SpacetreeSet::init(), peano4::initParallelEnvironment(), tarch::multicore::initSmartMPI(), toolbox::loadbalancing::AbstractLoadBalancing::isInterRankBalancingBad(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::lockSemaphoreOnGlobalMaster(), main(), swift2::parseCommandLineArguments(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), exahype2::RefinementControlService::receiveDanglingMessages(), peano4::parallel::SpacetreeSet::receiveDanglingMessages(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::receiveDanglingMessages(), reduce(), peano4::grid::reduceGridControlEvents(), toolbox::particles::ParticleSet< T >::reduceParticleStateStatistics(), toolbox::particles::ParticleSet< T >::reduceReassignmentStatistics(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::releaseLock(), swift2::statistics::reportSearchRadiusVTDt(), swift2::dastgenTest::reportStep(), peano4::parallel::Node::reserveId(), runBenchmarks(), runParallel(), runParallel(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), tarch::mpi::StringMessage::sendAndPollDanglingMessages(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::serveLockRequests(), tarch::logging::ChromeTraceFileLogger::setOutputFile(), tarch::logging::CommandLineLogger::setOutputFile(), peano4::parallel::Node::shutdown(), peano4::shutdownParallelEnvironment(), peano4::parallel::SpacetreeSet::split(), step(), peano4::parallel::SpacetreeSet::streamDataFromSplittingTreeToNewTree(), mghype::matrixfree::solvers::Solver::synchroniseGlobalResidualAndSolutionUpdate(), peano4::parallel::tests::PingPongTest::testBuiltInType(), peano4::parallel::tests::PingPongTest::testDaStGenArray(), peano4::parallel::tests::PingPongTest::testDaStGenArrayTreeManagementMessage(), peano4::parallel::tests::PingPongTest::testDaStGenTypeIntegerMessage(), peano4::parallel::tests::PingPongTest::testDaStGenTypeStartTraversalMessage(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithBlockingReceives(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithBlockingSends(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithBlockingSendsAndReceives(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithNonblockingReceives(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithNonblockingSends(), peano4::parallel::tests::PingPongTest::testMultithreadedPingPongWithNonblockingSendsAndReceives(), tarch::mpi::tests::StringTest::testSendReceive(), peano4::parallel::tests::NodeTest::testTagCalculation(), peano4::parallel::SpacetreeSet::traverse(), tarch::triggerNonCriticalAssertion(), exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents(), toolbox::loadbalancing::strategies::SpreadOut::triggerSplit(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::triggerSplit(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::tryLockSemaphoreOnGlobalMaster(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::unlockSemaphoreOnGlobalMaster(), updateDomainDecomposition(), toolbox::loadbalancing::CostMetrics::updateGlobalView(), toolbox::loadbalancing::Statistics::updateGlobalView(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateLoadBalancing(), toolbox::loadbalancing::strategies::SplitOversizedTree::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOut::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::updateLoadBalancing(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateState(), toolbox::loadbalancing::strategies::SplitOversizedTree::updateState(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateState(), tarch::logging::Log::warning(), peano4::writeCopyrightMessage(), tarch::logging::Statistics::writeToCSV(), and peano4::parallel::Node::~Node().

◆ getNumberOfRanks()

int tarch::mpi::Rank::getNumberOfRanks ( ) const
Returns
Number of Nodes Available

Definition at line 552 of file Rank.cpp.

References assertion.

Referenced by peano4::parallel::Node::continueToRun(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::getAction(), peano4::parallel::Node::getGlobalTreeId(), peano4::parallel::Node::getId(), toolbox::loadbalancing::CostMetrics::getLightestRank(), peano4::parallel::Node::getLocalTreeId(), exahype2::LoadBalancingConfiguration::getMinTreeSize(), toolbox::loadbalancing::strategies::SpreadOut::getNumberOfTreesPerRank(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::getNumberOfTreesPerRank(), peano4::parallel::Node::getRank(), toolbox::loadbalancing::strategies::SplitOversizedTree::getTargetTreeCost(), toolbox::loadbalancing::AbstractLoadBalancing::isInterRankBalancingBad(), main(), peano4::grid::reduceGridControlEvents(), runBenchmarks(), runParallel(), exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents(), updateDomainDecomposition(), toolbox::loadbalancing::strategies::SpreadOut::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateLoadBalancing(), toolbox::loadbalancing::strategies::SpreadOutOnceGridStagnates::updateLoadBalancing(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateState(), toolbox::loadbalancing::strategies::SplitOversizedTree::updateState(), and toolbox::loadbalancing::strategies::SpreadOutHierarchically::updateState().

Here is the caller graph for this function:

◆ getRank()

int tarch::mpi::Rank::getRank ( ) const

Return rank of this node.

In the serial version, i.e. without MPI, this operation always returns 0.

Returns
Rank of this node

Definition at line 529 of file Rank.cpp.

Referenced by toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::blockstructured::GlobalDatabase::addGlobalSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), toolbox::particles::TrajectoryDatabase::addParticleSnapshot(), peano4::grid::TraversalVTKPlotter::beginTraversal(), peano4::parallel::SpacetreeSet::cleanUpTrees(), toolbox::blockstructured::GlobalDatabase::dumpCSVFile(), toolbox::particles::TrajectoryDatabase::dumpCSVFile(), peano4::parallel::SpacetreeSet::exchangeAllHorizontalDataExchangeStacks(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), tarch::logging::LogFilter::filterOut(), toolbox::loadbalancing::strategies::Hardcoded::finishStep(), peano4::maps::HierarchicalStackMap< T >::getForPush(), peano4::parallel::Node::getGlobalTreeId(), tarch::logging::Log::getMachineInformation(), peano4::grid::reduceGridControlEvents(), runBenchmarks(), tarch::logging::ChromeTraceFileLogger::setOutputFile(), tarch::logging::CommandLineLogger::setOutputFile(), tarch::triggerNonCriticalAssertion(), exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents(), toolbox::loadbalancing::CostMetrics::updateGlobalView(), toolbox::loadbalancing::strategies::RecursiveBipartition::updateLoadBalancing(), and toolbox::loadbalancing::strategies::SplitOversizedTree::updateLoadBalancing().

Here is the caller graph for this function:

◆ init()

bool tarch::mpi::Rank::init ( int * argc,
char *** argv )

This operation initializes the MPI environment and the program instance.

Not that the argv and argc parameters are both in and out parameters. Before you pass them to the operation, they are set by the mpi environment. Afterwards the original parameters from the user are stored within them.

Implementation details

init never uses the log device to report any errors as the log device usually in turn uses Node's getters. Furthermore, the _initIsCalled flag has thus to be set before the log state operation is invoked.

Upper tag bound

We try to find out how many tags the system does support. Here, the MPI standard is a little bit weird, as it requires a pointer to a void pointer and then we can read the data from there. So it seems we get a pointer to a system variable back rather than the system variable.

Thanks to Andrew Mallinson (Intel) for pointing this out.

Returns
true if initialisation has been successful
See also
shutdown

Definition at line 472 of file Rank.cpp.

References tarch::mpi::DoubleMessage::initDatatype(), tarch::mpi::IntegerMessage::initDatatype(), and tarch::mpi::MPIReturnValueToString().

Here is the call graph for this function:

◆ isGlobalMaster()

bool tarch::mpi::Rank::isGlobalMaster ( ) const

Is this node the global master process, i.e.

does its rank equal get MasterProcessRank()? This operation returns always true if the code is not compiled with -DParallel.

Returns
Is this node the global master process?

Definition at line 420 of file Rank.cpp.

References assertion.

Referenced by applications::exahype2::euler::sphericalaccretion::MassAccumulator::finishAccumulation(), and main().

Here is the caller graph for this function:

◆ isInitialised()

bool tarch::mpi::Rank::isInitialised ( ) const

Definition at line 69 of file Rank.cpp.

◆ isMessageInQueue()

bool tarch::mpi::Rank::isMessageInQueue ( int tag) const

In older DaStGen version, I tried to find out whether a particular message type is in the MPI queue.

That is, I looked whether a message on this tag does exist, and then I looked whether the memory footprint matches via count. I think this is invalid. MPI really looks only into the number of bytes, so you have to know which type drops in once there is a message on a tag.

Definition at line 382 of file Rank.cpp.

◆ logStatus()

void tarch::mpi::Rank::logStatus ( ) const

Logs the status of the process onto the log device.

Definition at line 430 of file Rank.cpp.

References assertion, and logInfo.

◆ plotMessageQueues()

void tarch::mpi::Rank::plotMessageQueues ( )

Definition at line 86 of file Rank.cpp.

References logError.

◆ receiveDanglingMessagesFromReceiveBuffers()

void tarch::mpi::Rank::receiveDanglingMessagesFromReceiveBuffers ( )
private

Receive any Message Pending in the MPI/Receive Buffers.

We do poll MPI only every k iterations. Once we have found a message, we wait again k iterations. If the threshold is exceeded and no message is found we however do not reset k (I've tried this and it gives worse runtime): receiveDanglingMessages is typically called when we enter a critical phase of the simulation and we urgently wait for incoming MPI messages. So once we are in that critical regime and have already exceeded the k it would be harakiri to reset this one to 0 again - we only do so if data drops in and we thus may assume that we'll leave the critical phase anyway.

◆ reduce()

void tarch::mpi::Rank::reduce ( const void * sendbuf,
void * recvbuf,
int count,
MPI_Datatype datatype,
MPI_Op op,
int root,
std::function< void()> waitor = []() -> void {} )

◆ releaseTag()

void tarch::mpi::Rank::releaseTag ( int tag)
static

Definition at line 32 of file Rank.cpp.

References _tagCounter.

Referenced by tarch::shutdownNonCriticalAssertionEnvironment().

Here is the caller graph for this function:

◆ reserveFreeTag()

int tarch::mpi::Rank::reserveFreeTag ( const std::string & fullQualifiedMessageName,
int numberOfTags = 1 )
static

Return a Free Tag.

Returns a free tag to be used for a new datatype. Each result is delivered exactly once. The string argument is just for logging.

details

This operation should write something to the log devices. However, it is static and the class' log devices are static, too. C++ has no mechanism to define which static entity has to be instantiated first. On some systems, it hence happened that this tag registration got called before the logging device had been up and running. The problem is known as static initialization order problem:

    https://isocpp.org/wiki/faq/ctors#static-init-order

So what I do is that I log to std::cout only. This eliminated problems here.

Definition at line 39 of file Rank.cpp.

References assertion2.

Referenced by peano4::parallel::Node::init(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::init(), peano4::parallel::SpacetreeSet::init(), and tarch::initNonCriticalAssertionEnvironment().

Here is the caller graph for this function:

◆ setCommunicator()

void tarch::mpi::Rank::setCommunicator ( MPI_Comm communicator,
bool recomputeRankAndWorld = true )

Set communicator to be used by Peano.

Definition at line 576 of file Rank.cpp.

References logError, and tarch::mpi::MPIReturnValueToString().

Referenced by tarch::multicore::initSmartMPI().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ setDeadlockTimeOut()

void tarch::mpi::Rank::setDeadlockTimeOut ( int valueInSeconds)

Set deadlock time out.

Set after how much time a node waiting for an MPI message shall quit and shutdown the whole application with an error report. If you pass 0, that switches off this feature.

Definition at line 566 of file Rank.cpp.

References assertion, and logInfo.

Referenced by peano4::initParallelEnvironment(), and swift2::parseCommandLineArguments().

Here is the caller graph for this function:

◆ setDeadlockTimeOutTimeStamp()

void tarch::mpi::Rank::setDeadlockTimeOutTimeStamp ( )
Returns
Time stamp when next application should terminate because of a time out if no message has been received meanwhile.

Definition at line 198 of file Rank.cpp.

Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::lockSemaphoreOnGlobalMaster(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().

Here is the caller graph for this function:

◆ setDeadlockWarningTimeStamp()

void tarch::mpi::Rank::setDeadlockWarningTimeStamp ( )

Memorise global timeout.

Definition at line 193 of file Rank.cpp.

Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::lockSemaphoreOnGlobalMaster(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().

Here is the caller graph for this function:

◆ setTimeOutWarning()

void tarch::mpi::Rank::setTimeOutWarning ( int valueInSeconds)

Set time out warning.

Set after how much time a node waiting for an MPI message shall write a warning that it is likely that it ran into a deadlock. If you pass 0, that switches off this feature.

Definition at line 560 of file Rank.cpp.

References assertion.

Referenced by peano4::initParallelEnvironment().

Here is the caller graph for this function:

◆ shutdown()

void tarch::mpi::Rank::shutdown ( )

Shuts down the application.

Should be the last operation called by the overall application.

!!! Rationale

Originally, I put the shutdown operation into the destructor of Node. The MPI environment consequently was shut down as soon as the operating system terminates the application. However, Scalasca complained on the BlueGene/P that the destruction happened after the return statement of the main method. To make Peano work with Scalasca, I hence moved the MPI shutdown into a method called explicitely before the final return statement.

This seems to be a CLX effect, i.e. the Intel and the GNU compilers worked fine with Scalasca. Hence, I assume that Intel and GNU executables destroy all static objects (singletons) before they return from the main functions. CLX destroys the static objects after the return statement and thus makes Scalasca's instrumentation report an error.

Definition at line 395 of file Rank.cpp.

References assertion, logError, logTraceIn, logTraceOut, tarch::mpi::MPIReturnValueToString(), and tarch::mpi::IntegerMessage::shutdownDatatype().

Referenced by peano4::shutdownParallelEnvironment().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ suspendTimeouts()

void tarch::mpi::Rank::suspendTimeouts ( bool timeoutsDisabled)

Definition at line 203 of file Rank.cpp.

◆ triggerDeadlockTimeOut()

void tarch::mpi::Rank::triggerDeadlockTimeOut ( const std::string & className,
const std::string & methodName,
int communicationPartnerRank,
int tag,
int numberOfExpectedMessages = 1,
const std::string & comment = "" )

Triggers a time out and shuts down the cluster if a timeout is violated.

The implementation does not use MPI_Abort, since it seems that this operation requires all nodes running. Instead of, getDeadlockWarningTimeStamp() uses the system exit function passing it DEADLOCK_EXIT_CODE as exit code.

The operation should be called only if the deadlock time-out is switched on ( isTimeOutDeadlockEnabled() ) and the deadlock time-out has expired. Use getDeadlockWarningTimeStamp() and the system operation clock() to check the second requirement.

Parameters
classNameName of the class that triggers the deadlock shutdown.
methodNameName of the method that triggers the deadlock shutdown.
communicationPartnerRankRank of the node the operation that should have sent a message but did not.

Definition at line 124 of file Rank.cpp.

References logError.

Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().

Here is the caller graph for this function:

◆ validateMaxTagIsSupported()

bool tarch::mpi::Rank::validateMaxTagIsSupported ( )
static

Just try to find out if a tag is actually supported.

This works after we've called the init() routine. if we call this routine before, it will always pass. If the tag is not supported, we write a warning and return false.

Definition at line 454 of file Rank.cpp.

References logWarning.

◆ writeTimeOutWarning()

void tarch::mpi::Rank::writeTimeOutWarning ( const std::string & className,
const std::string & methodName,
int communicationPartnerRank,
int tag,
int numberOfExpectedMessages = 1 )

Writes a warning if relevant.

This operation writes a warning if the code might assume that it runs into a timeout (or excessive wait). The routine assumes that timeouts are enabled and that the user has called

setDeadlockTimeoutTimeStamp()

before. To avoid an excessive list of timeout warnings, a warning is always followed by moving the next warning timestamp (so you don't get dozens of timeouts) and the code also creases the time span that it uses from hereon to report on timeouts.

Definition at line 148 of file Rank.cpp.

References logWarning.

Referenced by tarch::mpi::BooleanSemaphore::BooleanSemaphoreService::acquireLock(), allReduce(), barrier(), toolbox::particles::SieveParticles< T >::exchangeSieveListsGlobally(), peano4::parallel::SpacetreeSet::finishAllOutstandingSendsAndReceives(), peano4::datamanagement::CellMarker::receiveAndPollDanglingMessages(), peano4::grid::AutomatonState::receiveAndPollDanglingMessages(), peano4::grid::GridControlEvent::receiveAndPollDanglingMessages(), peano4::grid::GridStatistics::receiveAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::receiveAndPollDanglingMessages(), peano4::grid::GridVertex::receiveAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::receiveAndPollDanglingMessages(), peano4::parallel::TreeEntry::receiveAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::receiveAndPollDanglingMessages(), tarch::mpi::DoubleMessage::receiveAndPollDanglingMessages(), tarch::mpi::IntegerMessage::receiveAndPollDanglingMessages(), tarch::mpi::StringMessage::receiveAndPollDanglingMessages(), reduce(), peano4::datamanagement::CellMarker::sendAndPollDanglingMessages(), peano4::grid::AutomatonState::sendAndPollDanglingMessages(), peano4::grid::GridControlEvent::sendAndPollDanglingMessages(), peano4::grid::GridStatistics::sendAndPollDanglingMessages(), peano4::grid::GridTraversalEvent::sendAndPollDanglingMessages(), peano4::grid::GridVertex::sendAndPollDanglingMessages(), peano4::parallel::StartTraversalMessage::sendAndPollDanglingMessages(), peano4::parallel::TreeEntry::sendAndPollDanglingMessages(), peano4::parallel::TreeManagementMessage::sendAndPollDanglingMessages(), tarch::mpi::DoubleMessage::sendAndPollDanglingMessages(), tarch::mpi::IntegerMessage::sendAndPollDanglingMessages(), and tarch::mpi::StringMessage::sendAndPollDanglingMessages().

Here is the caller graph for this function:

Field Documentation

◆ _areTimeoutsEnabled

bool tarch::mpi::Rank::_areTimeoutsEnabled
private

Global toggle to enable/disable timeouts.

This flag helps user codes to enable/disable timeouts (temporarily). Note that the flag is not the only way how to disable the timeouts. Alternatively, you can set _deadlockTimeOut to zero.

Definition at line 130 of file Rank.h.

◆ _communicator

MPI_Comm tarch::mpi::Rank::_communicator
private

MPI Communicator this process belongs to.

Definition at line 89 of file Rank.h.

◆ _deadlockTimeOut

std::chrono::seconds tarch::mpi::Rank::_deadlockTimeOut
private

Time to timeout.

This counter specifies the seconds until a timeout is triggered. If the field equals zero, no timeouts will ever happen.

Definition at line 118 of file Rank.h.

◆ _globalTimeOutDeadlock

std::chrono::system_clock::time_point tarch::mpi::Rank::_globalTimeOutDeadlock
private

Definition at line 121 of file Rank.h.

◆ _globalTimeOutWarning

std::chrono::system_clock::time_point tarch::mpi::Rank::_globalTimeOutWarning
private

Definition at line 120 of file Rank.h.

◆ _initIsCalled

bool tarch::mpi::Rank::_initIsCalled
private

Is set true if init() is called.

Definition at line 73 of file Rank.h.

◆ _log

tarch::logging::Log tarch::mpi::Rank::_log
staticprivate

Logging device.

For the machine name.

If it doesn't work, switch it off in the file CompilerSpecificSettings.h.

Definition at line 66 of file Rank.h.

◆ _maxTags

int tarch::mpi::Rank::_maxTags = -1
staticprivate

Set by init() and actually stores the number of valid tags.

Definition at line 135 of file Rank.h.

◆ _numberOfProcessors

int tarch::mpi::Rank::_numberOfProcessors
private

Number of processors available.

Definition at line 83 of file Rank.h.

◆ _rank

int tarch::mpi::Rank::_rank
private

Rank (id) of this process.

Definition at line 78 of file Rank.h.

◆ _singleton

tarch::mpi::Rank tarch::mpi::Rank::_singleton
staticprivate

Definition at line 68 of file Rank.h.

◆ _tagCounter

int tarch::mpi::Rank::_tagCounter = 0
staticprivate

Count the tags that have already been handed out.

Definition at line 140 of file Rank.h.

Referenced by releaseTag().

◆ _timeOutWarning

std::chrono::seconds tarch::mpi::Rank::_timeOutWarning
private

Timeout warning.

How long shall the application wait until it writes a time-out warning. Different to _deadlockTimeOut, this value changes over time: If we write a warning message, we increase this value. So, by default, you can make is rather small. The code then will increase it once you have written a warning to avoid that the terminal is flooded with these warnings. It will never by bigger than _deadlockTimeOut though due to this growth.

Despite the growth discussion, you can use a timeout warning that is greater than zero and set _deadlockTimeOut to zero. In this case, you will get warnings that the code thinks it runs into a deadlock, but you do not get a timeout ever.

See also
writeTimeOutWarning()
_deadlockTimeOut

Definition at line 110 of file Rank.h.

◆ DEADLOCK_EXIT_CODE

const int tarch::mpi::Rank::DEADLOCK_EXIT_CODE = -2
static

Definition at line 60 of file Rank.h.


The documentation for this class was generated from the following files: