Peano
Loading...
Searching...
No Matches
exahype2::RefinementControlService Class Reference

#include <RefinementControlService.h>

Inheritance diagram for exahype2::RefinementControlService:
Collaboration diagram for exahype2::RefinementControlService:

Public Member Functions

virtual ~RefinementControlService ()
 
virtual void receiveDanglingMessages () override
 Receive refinement control messages as shared by other ranks.
 
std::vector< peano4::grid::GridControlEventgetGridControlEvents () const
 
virtual void shutdown () override
 
void finishStep ()
 Should be called after each traversal per rank.
 
std::string toString () const
 
void merge (const RefinementControl &control)
 
- Public Member Functions inherited from tarch::services::Service
virtual ~Service ()
 

Static Public Member Functions

static RefinementControlServicegetInstance ()
 

Private Member Functions

 RefinementControlService ()
 
void freeAllPendingSendRequests ()
 Complete pending sends from previous mesh traversal.
 
void triggerSendOfCopyOfCommittedEvents ()
 

Private Attributes

RefinementControl::NewEvents _localNewEvents
 Container to accumulate new events.
 
std::list< peano4::grid::GridControlEvent_remoteNewEvents
 
std::vector< peano4::grid::GridControlEvent_committedEvents
 Container with all the valid events.
 
std::vector< MPI_Request * > _sendRequests
 
std::vector< peano4::grid::GridControlEvent_copyOfCommittedEvents
 

Static Private Attributes

static tarch::logging::Log _log
 
static tarch::multicore::RecursiveSemaphore _semaphore
 
static int _reductionTag
 I need a tag of my own to exchange control info after each step.
 

Additional Inherited Members

- Protected Attributes inherited from tarch::services::Service
tarch::multicore::RecursiveSemaphore _receiveDanglingMessagesSemaphore
 Recursive semaphores.
 

Detailed Description

MPI

After each mesh traversal, we expect the code to invoke finishStep(). Here, we exchange all local events with the other ranks. This is an important step for two reasons:

  • MPI ranks might fork trees which end up on other ranks. This other rank has to know if there had been refinement events.
  • If domain boundaries between ranks are ragged, we might end up with many tiny refinement events which cannot be merged into one large one. By exchanging these tiny guys, we can merge the events on each and every rank and come up with one large region which has to be refined.

While merging for refinement controls is only an optimisation, merging coarsening commands is mandatory. Only refined regions completely contained within one huge coarsening instruction are actually coarsened.

Lifecycle

Whenever a code inserts a new event, it has to specify the lifetime. For Runge-Kutta 4, for example, you might want to pass in a four as one step might trigger a refinement, but you want to realise the refinement at the end of the time step, i.e. four steps later.

Events received via MPI have no lifecycle. We therefore assign it the maximum lifetime plus an offset.

Definition at line 48 of file RefinementControlService.h.

Constructor & Destructor Documentation

◆ ~RefinementControlService()

exahype2::RefinementControlService::~RefinementControlService ( )
virtual

Definition at line 24 of file RefinementControlService.cpp.

References tarch::services::ServiceRepository::getInstance(), and tarch::services::ServiceRepository::removeService().

Here is the call graph for this function:

◆ RefinementControlService()

exahype2::RefinementControlService::RefinementControlService ( )
private

Definition at line 20 of file RefinementControlService.cpp.

References tarch::services::ServiceRepository::addService(), and tarch::services::ServiceRepository::getInstance().

Here is the call graph for this function:

Member Function Documentation

◆ finishStep()

void exahype2::RefinementControlService::finishStep ( )

Should be called after each traversal per rank.

This routine rolls over _newEvents into the _committedEvents if they are still alive. That is, _newEvents holds all the events including an "expiry date", while _committedEvents holds only those which are committed for this mesh traversal.

We clear the committed events and then copy the new events over one by one unless they have expired. If they have expired, we delete them within _newEvents. In this context, we also reduce the maxLifetime, but this one is solely used for reporting purposes. It is not used. The global maximum lifetime is stored within _maxLifetime. This one is monotonously growing, while the locally reduced value here might decrease over time if no new events are added.

Once we have committed all non-expiring events, we exchange them via MPI. For this, we first wait for all previous non-blocking communication to terminate (freeAllPendingSendRequests()), I create a copy of the committed events and then delegate the data exchange to triggerSendOfCopyOfCommittedEvents().

Finally, we append the remote events to our set of committed events. This has to happen after the actual MPI exchange, so we can be sure that remote events are not immediately sent back again.

I found it to be very important to merge the committed events immediately: Peano's spacetrees will merge, too, but if we work with 100 trees per node, each and every one will merge locally. This is time we better spend on something else.

Definition at line 92 of file RefinementControlService.cpp.

References logInfo, and peano4::grid::merge().

Here is the call graph for this function:

◆ freeAllPendingSendRequests()

void exahype2::RefinementControlService::freeAllPendingSendRequests ( )
private

Complete pending sends from previous mesh traversal.

See also
triggerSendOfCopyOfCommittedEvents()

Definition at line 137 of file RefinementControlService.cpp.

References tarch::services::ServiceRepository::getInstance(), and tarch::services::ServiceRepository::receiveDanglingMessages().

Here is the call graph for this function:

◆ getGridControlEvents()

std::vector< peano4::grid::GridControlEvent > exahype2::RefinementControlService::getGridControlEvents ( ) const
See also
RefinementControl::clear()

Definition at line 182 of file RefinementControlService.cpp.

◆ getInstance()

exahype2::RefinementControlService & exahype2::RefinementControlService::getInstance ( )
static

Definition at line 15 of file RefinementControlService.cpp.

◆ merge()

void exahype2::RefinementControlService::merge ( const RefinementControl & control)

◆ receiveDanglingMessages()

void exahype2::RefinementControlService::receiveDanglingMessages ( )
overridevirtual

Receive refinement control messages as shared by other ranks.

Whenever we receive such messages, we know that these stem from partner services. Therefore, we know that incoming messages stem from committed messages that are active right now. Therefore, we add them to our committed set. We also add them to the new set, as they might have dropped in slightly too late, i.e. we might already have handed out our events to "our" trees, i.e. those hosted on this rank. As we also add the messages to the new events, they will be delivered in a subsequent sweep in this case.

Realisation

  • We first do a general probe on MPI to see if there are any messages. If there are no messages at all, it makes no sense to continue.
  • If the guard check says "yes, there is a message", we next have to block the threads. From hereon, we work in a strict serial sense.
  • Now we check again (!) if there are messages. This is important as receiveDanglingMessages() might be called by multiple threads at the same time. So we might have entered the routine and see a message. Now we want to receive it, but some other thread might have checked as well and might have grabbed "our" message.
  • If the check is, once again, successful, we know the rank from which the message comes from. So far, all checks use MPI_ANY_RANK, but the returned status object now is befilled with the correct rank info. We use this rank from hereon, as we will receive multiple messages in a row from the rank.
  • Next, we work out how many messages we have been sent.
  • Finally, we receive the messages and add them to _committedEvents and _remoteNewEvents.

We note that most of these complex multithreading operations are rarely necessary, as receiveDanglingMessages() within the tarch::services::ServiceRepository::receiveDanglingMessages() is a prior thread-safe. Nevertheless, I decided to try to be on the safe side in case someone wants to call this receiveDanglingMessages() version explicitly.

Implements tarch::services::Service.

Definition at line 30 of file RefinementControlService.cpp.

References tarch::mpi::Rank::getCommunicator(), peano4::grid::GridControlEvent::getGlobalCommunciationDatatype(), tarch::mpi::Rank::getInstance(), and logDebug.

Here is the call graph for this function:

◆ shutdown()

void exahype2::RefinementControlService::shutdown ( )
overridevirtual

Implements tarch::services::Service.

Definition at line 28 of file RefinementControlService.cpp.

◆ toString()

std::string exahype2::RefinementControlService::toString ( ) const

Definition at line 78 of file RefinementControlService.cpp.

◆ triggerSendOfCopyOfCommittedEvents()

void exahype2::RefinementControlService::triggerSendOfCopyOfCommittedEvents ( )
private

MPI handing

The MPI data exchange here is non-trivial, as we do not know which rank has how many messages. So we run in three steps: First, we send the committed events out to all partners. These sends are done non-blocking. They logically are a broadcast. Second, we loop over all the arising MPI_Requests and poll until they are finished. So we know all data has gone out. This is a while loop with the predicate

  not sendRequests.empty()

Third, we hook into this while loop and probe for all the other ranks if they have sent us anything. So while we wait for our sends to go out, we poll the other ranks for their data, integrate this information into our data structures and therefore also free any MPI queues.

As we build up the incoming data while we wait for our sends to go out, we have to send from a copy of the actual data (copyOfCommittedEvents): We cannot alter the outgoing data due to polls for incoming messages while we have still pending sends.

It is important to work with non-blocking calls here, as the code otherwise tends to deadlock if we have a lot of events. This is due to MPI falling back to rendezvous message passing if the data that is on-the-fly becomes too large.

Definition at line 156 of file RefinementControlService.cpp.

References tarch::mpi::Rank::getCommunicator(), peano4::grid::GridControlEvent::getGlobalCommunciationDatatype(), tarch::mpi::Rank::getInstance(), tarch::mpi::Rank::getNumberOfRanks(), tarch::mpi::Rank::getRank(), and logInfo.

Here is the call graph for this function:

Field Documentation

◆ _committedEvents

std::vector<peano4::grid::GridControlEvent> exahype2::RefinementControlService::_committedEvents
private

Container with all the valid events.

Is an extract from _newEvents which is built up in finishStep() and then handed out to Peano once it asks.

Event lifetime

The individual events are not only copied over. Each event is annotated with its lifetime. That is, events might remain active over multiple iterations. This operation decrements the lifespan and, after that, copies those events that are still alive over into the result.

Optimisation

The raw events might become really large over time. However, I decided to keep this routine clear of any optimisation. It is the grid which has to clean up the refinement events appropriately.

MPI data excange

I originally wanted to use an MPI reduction to have a consistent view of all refinement controls. However, this seems not to work, as some ranks can be out of sync. So what I do instead now is a delayed broadcast: Every rank sends out its committed events to all others and appends all incoming ones to its own set.

This simple implementation also works for our dynamic event sets, where we do not know a priori how many events are triggered by a rank. Still, it can happen that event views are out-of-sync. This is not a problem here:

The actual grid changes are all triggered via vertices, so we never obtain an invalid grid. The individual grid events have a lifetime and thus are active over multiple iterations. Hence, they will be merge at one point.

We have to split up the send and receive loop such that we first send out all stuff and then receive.

Definition at line 194 of file RefinementControlService.h.

◆ _copyOfCommittedEvents

std::vector<peano4::grid::GridControlEvent> exahype2::RefinementControlService::_copyOfCommittedEvents
private

Definition at line 198 of file RefinementControlService.h.

◆ _localNewEvents

RefinementControl::NewEvents exahype2::RefinementControlService::_localNewEvents
private

Container to accumulate new events.

This is a list as we may assume that a lot of inserts are done per iteration.

Definition at line 152 of file RefinementControlService.h.

◆ _log

tarch::logging::Log exahype2::RefinementControlService::_log
staticprivate

Definition at line 137 of file RefinementControlService.h.

◆ _reductionTag

int exahype2::RefinementControlService::_reductionTag
staticprivate

I need a tag of my own to exchange control info after each step.

Definition at line 146 of file RefinementControlService.h.

◆ _remoteNewEvents

std::list<peano4::grid::GridControlEvent> exahype2::RefinementControlService::_remoteNewEvents
private

Definition at line 154 of file RefinementControlService.h.

◆ _semaphore

tarch::multicore::RecursiveSemaphore exahype2::RefinementControlService::_semaphore
staticprivate

Definition at line 139 of file RefinementControlService.h.

◆ _sendRequests

std::vector<MPI_Request*> exahype2::RefinementControlService::_sendRequests
private

Definition at line 197 of file RefinementControlService.h.


The documentation for this class was generated from the following files: