Peano
Loading...
Searching...
No Matches
tarch::multicore::taskfusion Namespace Reference

Task fusion means that a set of tasks are grabbed and mapped onto one large physical tasks instead of being processed one by one. More...

Data Structures

class  FusableTasksQueue
 Represents a queue of fusable tasks. More...
class  LogReadyTask
 Wrapper task around fusable tasks that is not ready yet. More...
class  ProcessOneReadyTask
 Process one ready tasks. More...
class  ProcessSetOfReadyTasks
 Process set of ready tasks. More...

Functions

void handleFusableTask (Task *task, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber)
 Handle one fusable task.
void processAllReadyTasks ()
 Run over all known queues and invoke drainReadyTasks() on each one.
void processAndFuseAllReadyTasks ()
 Drain the queues but use task fusion where possible.
void handleFusableTask_ProducerConsumerTaskSequence (Task *task, FusableTasksQueue &taskQueue, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber, const tarch::multicore::orchestration::Strategy::FuseInstruction &fuseInstruction)
 Realisation for handleFusableTask() for ProducerConsumerTaskSequence.
void handleFusableTask_LazyEvaluation (Task *task, FusableTasksQueue &taskQueue, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber, const tarch::multicore::orchestration::Strategy::FuseInstruction &fuseInstruction)
 Implementation of handleFusableTask() if we use LazyEvaluation.
void handleFusableTask_EagerConsumerTasks (Task *task, FusableTasksQueue &taskQueue, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber, const tarch::multicore::orchestration::Strategy::FuseInstruction &fuseInstruction)
 Handle a task in the eager consumer pattern.
bool ensureHeldBackTasksAreMappedOntoNativeTasks (const std::set< TaskNumber > &taskNumbers)
 Ensure that a certain set of tasks is mapped onto native tasks already.
bool ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation (const std::set< TaskNumber > &inDependencies)
 Preamble of waitForTasks() and task spawns if we use LazyEvaluation.
bool ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks (const std::set< TaskNumber > &inDependencies)
 Preamble of waitForTasks() if we use EagerEvaluation.

Detailed Description

Task fusion means that a set of tasks are grabbed and mapped onto one large physical tasks instead of being processed one by one.

This can reduce task administration overheads, but it also allows the routine handling a set of tasks to exploit commonalities or the larger intra-task concurrency. Due to the latter argument, task fusion is Peano's favourite technique to keep GPUs busy.

There are many ways how task fusion can be implemented. In Peano, the different strategies are controlled through the multicore orchestration. There's the enum tarch::multicore::orchestration::FusionImplementation in there, which also contains all the high-level overviews of how the different strategies work and when they are appropriate. More information on how to use the task fusion and how to tailor it is provided in the context of Task fusion strategies. On this page, we do a deep-dive into the implementation of the various strategies.

Task queues

To ensure that fusion is only applied to tasks of the same type, we store the fusable tasks in a map indexed by the task type. The map maps each task type onto an instance of tarch::multicore::taskfusion::FusableTaskQueue. When you spawn a task through handleFusableTask(), the first thing this routine has to do is to check if there's a task queue in place already. Consult the routine's description for more details.

The two variables that are used are therefore

  • fusableTaskQueue a map of task types onto queue instances on the heap, and
  • pendingTasksSemaphore which is a sempahore to protect access to the heap.

The two variables are only found in the implementation file, as they are not visible from outside. As our map works with pointers to the actual queues, it is sufficient to have one semaphore that protects access: Once we have got the pointer from the map, we can always release the semaphore and work against the queue object on the heap. Meanwhile, other routines might add new queues to the map. We don't really care anymore.

The important routines

There are only two important routines from a user's point of view:

Consult the documentation of these routines for details when they are used. The documentation also provides information how the underlying implementations work.

Realisation of different task fusion strategies

The different task fusion strategies are sketched out in tarch::multicore::strategies::FusionImplementation. The text there discusses how the task fusion acts and reacts. At their core, all realisations however build upon the FusableTaskQueue, i.e. the task queue offers a certain set up operations. The different fusion strategies call/use these operations in different ways.

Function Documentation

◆ ensureHeldBackTasksAreMappedOntoNativeTasks()

bool tarch::multicore::taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks ( const std::set< TaskNumber > & taskNumbers)

Ensure that a certain set of tasks is mapped onto native tasks already.

Our task fusion holds back tasks in bespoke queues such that they can be bundled together into a larger meta task. However, this implies that we cannot do two things:

  1. We cannot wait for them with built-in waits(). If a task, for example, is not yet passed on to OpenMP, OpenMP's wait cannot wait for this task by definition.
  2. If we have in-dependencies of a new task, these in-dependencies cannot be modelled with on-board features if the predecessor is not yet mapped onto a task. We would create an antidependency.

In both cases, we therefore have to invoke this routine first, so we ensure that the tasks that we wait for or depend on, respectively, is indeed already a native task and not still parked in a queue somewhere.

The routine delegates to implementation-specific variants depending on which task fusion strategy the orchestration picks.

The function is used by tarch::multicore::waitForTasks() and tarch::multicore::spawnTask(). The latter calls it over all incoming dependencies.

Parameters
taskNumbersSet of task numbers we depend on. May not be empty.
Returns
The routine might have created new native tasks, i.e. the calling routine has to wait for these native tasks.

Definition at line 237 of file taskfusion.cpp.

References assertion1, tarch::multicore::orchestration::DisableFusionAndMapOntoNativeTask, tarch::multicore::orchestration::DisableFusionAndProcessImmediately, tarch::multicore::orchestration::EagerConsumerTasks, ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks(), ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation(), tarch::multicore::getOrchestration(), tarch::multicore::orchestration::LazyEvaluation, logDebug, tarch::multicore::orchestration::ProducerConsumerTaskSequence, tarch::multicore::orchestration::TimedExecution, and toString().

Referenced by handleFusableTask(), and tarch::multicore::waitForTasks().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks()

bool tarch::multicore::taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks ( const std::set< TaskNumber > & inDependencies)

Preamble of waitForTasks() if we use EagerEvaluation.

This routine is very similar to ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation(), besides the fact that it never ever spawns a fusion task. If the eager scheme results in fusion tasks, then these fusion tasks have already been created by the time we spawned the tasks. We don't create them by the time we wait for a task. So we should say: "it is a degenerated version of ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation()".

Definition at line 273 of file taskfusion.cpp.

References anyTasksOfInDendenciesMightBeStillReadyOrRunning(), assertion1, tarch::multicore::taskfusion::FusableTasksQueue::LockQueueAndReturnReliableResult, logTraceInWith1Argument, logTraceOutWith1Argument, and tarch::multicore::spawnTask().

Referenced by ensureHeldBackTasksAreMappedOntoNativeTasks().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation()

bool tarch::multicore::taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation ( const std::set< TaskNumber > & inDependencies)

Preamble of waitForTasks() and task spawns if we use LazyEvaluation.

The routine ensureHeldBackTasksAreMappedOntoNativeTasks() delegates to this one if we have chosen lazy evaluation. The job here is that we ensure that any pending work is mapped properly onto native tasks or that all tasks have finished. In the latter case, any follow-up wait degenerates to an anti-dependency.

The spawning of our task of interest has either created a LogReadyTask instance, or it has directly dropped the task that we wait for into the task queue. We don't know if this LogReadyTask is still out there, so first we have to wait for it. We know it had the same number as the original task, as we had swapped it out for this one.

Next, we check if our task of interest is still in this queue, i.e. is ready or running. We can do so through anyTasksOfInDendenciesMightBeStillReadyOrRunning(). If they are not there, we can return with a false, as we know that the calling routine does not have to wait for us. We are ready to go.

If one of our tasks that we wait for is still in the queue, we loop over each individual in-dependency:

  • If the in-dependency is not found, skip and go to the next one.
  • If the in-dependecy's task is in the queue, first issue a task which does all the task fusion (task A).
  • Next, trigger another task B which picks only this one task we are actually interested in (task B).

We ensure that A and B have exactly the same number of one of the in-dependencies. This way, A will execute before B and the user will wait for B. Please read the text below to understand why the split-up into A and B is not just for fun. It is required to ensure that we do not run into deadlocks.

Realisation

We don't know, at this point, what the type of each in dependency is. Therefore, we loop over all queues, i.e. all task types that we are aware of, and look if any of the tasks we are interested in is still in there labelled as ready or running.

First, we look in each queue if it hosts the a task that is ready or running. If this is the case, we invoke the task fusion and afterwards ensure that the task is really completed by spawning ProcessOneReadyTask.

Potential deadlocks

It is important that ProcessOneReadyTask and ProcessSetOfReadyTasks are modelled independently to avoid deadlocks: In most cases, the two tasks just follow each other. However, it could also be that we hit this routine on threads A and are being asked to process task t1. At the same time, thread B also hits the same routine triggered for task t2. As it happens B is faster and grabs t1. Now, if thread A checks for t1 to be ready or running, the routine processTask() will, rightly so, wait for the responsible task for t1 to finish its work. That means, we have to embed the whole handling into a separate task, so we can wait for it.

Concurrency level

The routine seems to basically create two tasks that run after each other and then tells the invoking code part to wait for it. So it seems not to alter the concurrency level. However, it actually increases it: If we wait for multiple input tasks, it issues these pairs of tasks per unfulfilled in-dependency.

Despite this observation, it remains clear that the eager invocation yields a higher concurrency level. See handleFusableTask_EagerConsumerTasks() and ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks() for details.

A flaw of the present implementation is that additional

native::waitForTasks(inDependencies);

We need this one, as we don't know if there are still instances of LogReadyTask out there pending. If we knew that this is not the case, we could skip this step and eliminate one synchronisation point.

See also
ProcessSetOfReadyTasks which triggers the actual lazy fusion of tasks.
ProcessOneReadyTask which is used to check if a particular task has completed already.

Definition at line 305 of file taskfusion.cpp.

References anyTasksOfInDendenciesMightBeStillReadyOrRunning(), assertion1, tarch::multicore::taskfusion::FusableTasksQueue::LockQueueAndReturnReliableResult, logDebug, logTraceInWith1Argument, logTraceOutWith1Argument, tarch::multicore::NoInDependencies, tarch::multicore::spawnTask(), and toString().

Referenced by ensureHeldBackTasksAreMappedOntoNativeTasks().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ handleFusableTask()

void tarch::multicore::taskfusion::handleFusableTask ( Task * task,
std::set< TaskNumber > inDependencies,
const TaskNumber & taskNumber )

Handle one fusable task.

This operation handles one fusable task in the sense that it first ensures that all infrastructure for the fusion is in place and that we can, in principle, fuse. After that, it delegates to implementation selected by the orchestration:

  1. Ensure that all incoming tasks (predecessors) are mapped onto native tasks. Our implementation leaves all dependency tracking to the native runtime, and therefore, we invoke taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks() first.
  2. Find out if task fusion is active. If not, ensure that the task is either executed directly or spawned into a native task. Which variant to pick depends on the orchestration's getFusionImplementation() result, which returns an instance of tarch::multicore::orchestration::FusionImplementation.
  3. Lock the map over all queues through pendingTasksSemaphore.
  4. Ensure that a queue does exist for the particular task type.
  5. Invoke the task handling that's specified through getFusionImplementation().

The routine is invoked by tarch::multicore::spawnTask() if this routine is given a task that can, in theory, be fused. Below are some remarks on particular strategies:

DisableFusionAndProcessImmediately and other degenerated strategies

This variant from tarch::multicore::orchestration::FusionImplementation implies that we directly call the task's run() operation. However, we first have to ensure that we wait for all incoming tasks. We do so through a call to waitForTasks(). This wait in return is valid as we have previously called taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks().

LazyEvaluation

Almost trival realisation of the core functionality through handleFusableTask_LazyEvaluation(). The interesting stuff, i.e. the fusion and task processing, all happens in ensureHeldBackTasksAreMappedOntoNativeTasks() which in turn calls ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation().

See also
Page Task fusion strategies provides a high-level overview
tarch::multicore::orchestration::FusionImplementation enlists the different implementation variants

Definition at line 48 of file taskfusion.cpp.

References assertion, assertion1, tarch::multicore::orchestration::Strategy::FuseInstruction::device, tarch::multicore::orchestration::DisableFusionAndMapOntoNativeTask, tarch::multicore::orchestration::DisableFusionAndProcessImmediately, tarch::multicore::orchestration::EagerConsumerTasks, ensureHeldBackTasksAreMappedOntoNativeTasks(), tarch::multicore::MultiReadSingleWriteLock::free(), tarch::multicore::orchestration::Strategy::fuse(), tarch::logging::Statistics::getInstance(), tarch::multicore::getOrchestration(), tarch::multicore::Task::getTaskType(), handleFusableTask_EagerConsumerTasks(), handleFusableTask_LazyEvaluation(), handleFusableTask_ProducerConsumerTaskSequence(), tarch::multicore::Task::Host, tarch::logging::Statistics::inc(), tarch::multicore::orchestration::LazyEvaluation, logDebug, tarch::multicore::orchestration::Strategy::FuseInstruction::maxTasks, tarch::multicore::orchestration::Strategy::FuseInstruction::minTasks, tarch::multicore::orchestration::ProducerConsumerTaskSequence, tarch::multicore::Task::run(), tarch::multicore::orchestration::TimedExecution, tarch::multicore::orchestration::Strategy::FuseInstruction::toString(), tarch::multicore::waitForTasks(), and tarch::multicore::MultiReadSingleWriteLock::Write.

Referenced by tarch::multicore::spawnTask().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ handleFusableTask_EagerConsumerTasks()

void tarch::multicore::taskfusion::handleFusableTask_EagerConsumerTasks ( Task * task,
FusableTasksQueue & taskQueue,
std::set< TaskNumber > inDependencies,
const TaskNumber & taskNumber,
const tarch::multicore::orchestration::Strategy::FuseInstruction & fuseInstruction )

Handle a task in the eager consumer pattern.

This routine has to be used in combination with handleFusableTask_LazyEvaluation(). It does, different as the name suggests, not enqueue any task. Instead, it simply evaluates if the task queue is ready for a fusion task. If so, it spawns one.

Definition at line 145 of file taskfusion.cpp.

References tarch::multicore::taskfusion::FusableTasksQueue::getNumberOfReadyTasks(), tarch::multicore::orchestration::Strategy::FuseInstruction::minTasks, tarch::multicore::NoInDependencies, and tarch::multicore::spawnTask().

Referenced by handleFusableTask().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ handleFusableTask_LazyEvaluation()

void tarch::multicore::taskfusion::handleFusableTask_LazyEvaluation ( Task * task,
FusableTasksQueue & taskQueue,
std::set< TaskNumber > inDependencies,
const TaskNumber & taskNumber,
const tarch::multicore::orchestration::Strategy::FuseInstruction & fuseInstruction )

Implementation of handleFusableTask() if we use LazyEvaluation.

Lazy evaluation means that we take a task, but we wrap it out for a proxy task. This is an instance of LogReadyTask. The run() of the LogReadyTask instance dumps the actual task into the respective task queue. If there are no incoming dependencies, we can directly dump the actual task.

After that, we can return. If someone waits for our task (which we replaced with the LogReadyTask), we have to ensure that the task has been removed from the queue and processed. This is realised through ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation().

See also
Task fusion strategies for some context for this routine.

Definition at line 120 of file taskfusion.cpp.

References tarch::multicore::Task::getPriority(), tarch::multicore::taskfusion::FusableTasksQueue::insertReadyTask(), logDebug, logTraceInWith1Argument, logTraceOutWith1Argument, tarch::multicore::NoOutDependencies, and tarch::multicore::Task::setPriority().

Referenced by handleFusableTask().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ handleFusableTask_ProducerConsumerTaskSequence()

void tarch::multicore::taskfusion::handleFusableTask_ProducerConsumerTaskSequence ( Task * task,
FusableTasksQueue & taskQueue,
std::set< TaskNumber > inDependencies,
const TaskNumber & taskNumber,
const tarch::multicore::orchestration::Strategy::FuseInstruction & fuseInstruction )

Realisation for handleFusableTask() for ProducerConsumerTaskSequence.

Translate a single task into a sequence of (fusable tasks)

This routine makes a decision if to spawn a task as native task or to treat it as a fused task. The routine therefore contains all the fusion logic. If we fuse, it first ensure that there's a queue for this task type, and then creates the two helper tasks that will actually handle the queue insertion and the task progression.

Algorithmic steps

The code first of all locks all the tasks through pendingTasksSemaphore and then ensures that there is a task queue for this particular task type. After that, the pointer taskQueue is guaranteed to point to the correct queue.

Tasks without task numbers (out-dependencies)

We cannot split the whole scheduling into two parts, as we do not not have a dependency number. So we insert the tasks straightforward into the out queue. After that, we issue a postprocessing task to handle the fused tasks. This will have no outgoing dependencies either.

Tasks without in-dependencies

We know that we have out-dependencies (see previous case distinction), so we can directly insert the task into the queue and then spawn a ProcessReadyTask which handles this tasks and models the out-dependency.

Tasks with in-dependencies

If the new fusable task has in-dependencies, we have to create a new instance of LogReadyTask and spawn it. We then immediately create a ProcessReadyTask instance with a dependency.

Decision logic

If the max number of tasks that should be fused is smaller one, the user has, by definition switched off all fusion. We therefore immediately spawn the task as a native one.

Partial serialisation

The whole idea of the fusion is that multiple tasks are bundled into one. If there are too many instances of ProcessReadyTask, then they will all grab from the fusion queue and there will never be enough tasks. So it makes sense to serialise the ProcessReadyTask instances if there are less than two times the max tasks in the queue.

Definition at line 163 of file taskfusion.cpp.

References tarch::multicore::Task::getPriority(), tarch::multicore::taskfusion::FusableTasksQueue::insertReadyTask(), logDebug, tarch::multicore::orchestration::Strategy::FuseInstruction::maxTasks, tarch::multicore::orchestration::Strategy::FuseInstruction::minTasks, tarch::multicore::NoInDependencies, tarch::multicore::NoOutDependencies, tarch::multicore::Task::setPriority(), and tarch::multicore::spawnTask().

Referenced by handleFusableTask().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ processAllReadyTasks()

void tarch::multicore::taskfusion::processAllReadyTasks ( )

Run over all known queues and invoke drainReadyTasks() on each one.

Once we know that all fusion queues are drained, we know that they are either complete or mapped successfully onto native tasks. That is, we can invoke a task barrier.

This routine is used by the global synchronisation, i.e. by tarch::multicore::waitForAllTasks(), but we also have to invoke it before we yield: The whole idea behind a yield() is that we give the runtime the opportunity to bring in some other tasks, i.e. to disrupt the calculation. However, the runtime cannot bring in other tasks if the tasks are hidden away in a queue.

This routine invokes tarch::multicore::taskfusion::FusableTasksQueue::drainReadyTasks(). That is, this routine does not(!) issue any task fusion.

Definition at line 209 of file taskfusion.cpp.

References tarch::multicore::MultiReadSingleWriteLock::Read.

Referenced by tarch::multicore::waitForAllTasks(), and tarch::multicore::Core::yield().

Here is the caller graph for this function:

◆ processAndFuseAllReadyTasks()

void tarch::multicore::taskfusion::processAndFuseAllReadyTasks ( )

Drain the queues but use task fusion where possible.

This is the cousin to processAllReadyTasks(). It ensures that the task queues are drained eventually, but it relies on task fusion rather than a manual drain. Different to processAllReadyTasks(), you do not have to issue a manual barrier afterward this routine has returned. You may assume that all tasks are literally complete.

This blocking behaviour implies that you stop your program execution. However, nothing stops you, subject to the correct dependency management, to deploy the call to processAndFuseAllReadyTasks() to yet another task of its own.

The routine can be used to set off timed execution as discussed in tarch::multicore::orchestration::FusionImplementation.

Definition at line 217 of file taskfusion.cpp.

References tarch::multicore::MultiReadSingleWriteLock::free(), tarch::multicore::NoOutDependencies, tarch::multicore::MultiReadSingleWriteLock::Read, and tarch::multicore::waitForTask().

Here is the call graph for this function: