|
Peano
|
Task fusion means that a set of tasks are grabbed and mapped onto one large physical tasks instead of being processed one by one. More...
Data Structures | |
| class | FusableTasksQueue |
| Represents a queue of fusable tasks. More... | |
| class | LogReadyTask |
| Wrapper task around fusable tasks that is not ready yet. More... | |
| class | ProcessOneReadyTask |
| Process one ready tasks. More... | |
| class | ProcessSetOfReadyTasks |
| Process set of ready tasks. More... | |
Functions | |
| void | handleFusableTask (Task *task, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber) |
| Handle one fusable task. | |
| void | processAllReadyTasks () |
| Run over all known queues and invoke drainReadyTasks() on each one. | |
| void | processAndFuseAllReadyTasks () |
| Drain the queues but use task fusion where possible. | |
| void | handleFusableTask_ProducerConsumerTaskSequence (Task *task, FusableTasksQueue &taskQueue, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber, const tarch::multicore::orchestration::Strategy::FuseInstruction &fuseInstruction) |
| Realisation for handleFusableTask() for ProducerConsumerTaskSequence. | |
| void | handleFusableTask_LazyEvaluation (Task *task, FusableTasksQueue &taskQueue, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber, const tarch::multicore::orchestration::Strategy::FuseInstruction &fuseInstruction) |
| Implementation of handleFusableTask() if we use LazyEvaluation. | |
| void | handleFusableTask_EagerConsumerTasks (Task *task, FusableTasksQueue &taskQueue, std::set< TaskNumber > inDependencies, const TaskNumber &taskNumber, const tarch::multicore::orchestration::Strategy::FuseInstruction &fuseInstruction) |
| Handle a task in the eager consumer pattern. | |
| bool | ensureHeldBackTasksAreMappedOntoNativeTasks (const std::set< TaskNumber > &taskNumbers) |
| Ensure that a certain set of tasks is mapped onto native tasks already. | |
| bool | ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation (const std::set< TaskNumber > &inDependencies) |
| Preamble of waitForTasks() and task spawns if we use LazyEvaluation. | |
| bool | ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks (const std::set< TaskNumber > &inDependencies) |
| Preamble of waitForTasks() if we use EagerEvaluation. | |
Task fusion means that a set of tasks are grabbed and mapped onto one large physical tasks instead of being processed one by one.
This can reduce task administration overheads, but it also allows the routine handling a set of tasks to exploit commonalities or the larger intra-task concurrency. Due to the latter argument, task fusion is Peano's favourite technique to keep GPUs busy.
There are many ways how task fusion can be implemented. In Peano, the different strategies are controlled through the multicore orchestration. There's the enum tarch::multicore::orchestration::FusionImplementation in there, which also contains all the high-level overviews of how the different strategies work and when they are appropriate. More information on how to use the task fusion and how to tailor it is provided in the context of Task fusion strategies. On this page, we do a deep-dive into the implementation of the various strategies.
To ensure that fusion is only applied to tasks of the same type, we store the fusable tasks in a map indexed by the task type. The map maps each task type onto an instance of tarch::multicore::taskfusion::FusableTaskQueue. When you spawn a task through handleFusableTask(), the first thing this routine has to do is to check if there's a task queue in place already. Consult the routine's description for more details.
The two variables that are used are therefore
The two variables are only found in the implementation file, as they are not visible from outside. As our map works with pointers to the actual queues, it is sufficient to have one semaphore that protects access: Once we have got the pointer from the map, we can always release the semaphore and work against the queue object on the heap. Meanwhile, other routines might add new queues to the map. We don't really care anymore.
There are only two important routines from a user's point of view:
Consult the documentation of these routines for details when they are used. The documentation also provides information how the underlying implementations work.
The different task fusion strategies are sketched out in tarch::multicore::strategies::FusionImplementation. The text there discusses how the task fusion acts and reacts. At their core, all realisations however build upon the FusableTaskQueue, i.e. the task queue offers a certain set up operations. The different fusion strategies call/use these operations in different ways.
| bool tarch::multicore::taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks | ( | const std::set< TaskNumber > & | taskNumbers | ) |
Ensure that a certain set of tasks is mapped onto native tasks already.
Our task fusion holds back tasks in bespoke queues such that they can be bundled together into a larger meta task. However, this implies that we cannot do two things:
In both cases, we therefore have to invoke this routine first, so we ensure that the tasks that we wait for or depend on, respectively, is indeed already a native task and not still parked in a queue somewhere.
The routine delegates to implementation-specific variants depending on which task fusion strategy the orchestration picks.
The function is used by tarch::multicore::waitForTasks() and tarch::multicore::spawnTask(). The latter calls it over all incoming dependencies.
| taskNumbers | Set of task numbers we depend on. May not be empty. |
Definition at line 237 of file taskfusion.cpp.
References assertion1, tarch::multicore::orchestration::DisableFusionAndMapOntoNativeTask, tarch::multicore::orchestration::DisableFusionAndProcessImmediately, tarch::multicore::orchestration::EagerConsumerTasks, ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks(), ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation(), tarch::multicore::getOrchestration(), tarch::multicore::orchestration::LazyEvaluation, logDebug, tarch::multicore::orchestration::ProducerConsumerTaskSequence, tarch::multicore::orchestration::TimedExecution, and toString().
Referenced by handleFusableTask(), and tarch::multicore::waitForTasks().


| bool tarch::multicore::taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks | ( | const std::set< TaskNumber > & | inDependencies | ) |
Preamble of waitForTasks() if we use EagerEvaluation.
This routine is very similar to ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation(), besides the fact that it never ever spawns a fusion task. If the eager scheme results in fusion tasks, then these fusion tasks have already been created by the time we spawned the tasks. We don't create them by the time we wait for a task. So we should say: "it is a degenerated version of ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation()".
Definition at line 273 of file taskfusion.cpp.
References anyTasksOfInDendenciesMightBeStillReadyOrRunning(), assertion1, tarch::multicore::taskfusion::FusableTasksQueue::LockQueueAndReturnReliableResult, logTraceInWith1Argument, logTraceOutWith1Argument, and tarch::multicore::spawnTask().
Referenced by ensureHeldBackTasksAreMappedOntoNativeTasks().


| bool tarch::multicore::taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation | ( | const std::set< TaskNumber > & | inDependencies | ) |
Preamble of waitForTasks() and task spawns if we use LazyEvaluation.
The routine ensureHeldBackTasksAreMappedOntoNativeTasks() delegates to this one if we have chosen lazy evaluation. The job here is that we ensure that any pending work is mapped properly onto native tasks or that all tasks have finished. In the latter case, any follow-up wait degenerates to an anti-dependency.
The spawning of our task of interest has either created a LogReadyTask instance, or it has directly dropped the task that we wait for into the task queue. We don't know if this LogReadyTask is still out there, so first we have to wait for it. We know it had the same number as the original task, as we had swapped it out for this one.
Next, we check if our task of interest is still in this queue, i.e. is ready or running. We can do so through anyTasksOfInDendenciesMightBeStillReadyOrRunning(). If they are not there, we can return with a false, as we know that the calling routine does not have to wait for us. We are ready to go.
If one of our tasks that we wait for is still in the queue, we loop over each individual in-dependency:
We ensure that A and B have exactly the same number of one of the in-dependencies. This way, A will execute before B and the user will wait for B. Please read the text below to understand why the split-up into A and B is not just for fun. It is required to ensure that we do not run into deadlocks.
We don't know, at this point, what the type of each in dependency is. Therefore, we loop over all queues, i.e. all task types that we are aware of, and look if any of the tasks we are interested in is still in there labelled as ready or running.
First, we look in each queue if it hosts the a task that is ready or running. If this is the case, we invoke the task fusion and afterwards ensure that the task is really completed by spawning ProcessOneReadyTask.
It is important that ProcessOneReadyTask and ProcessSetOfReadyTasks are modelled independently to avoid deadlocks: In most cases, the two tasks just follow each other. However, it could also be that we hit this routine on threads A and are being asked to process task t1. At the same time, thread B also hits the same routine triggered for task t2. As it happens B is faster and grabs t1. Now, if thread A checks for t1 to be ready or running, the routine processTask() will, rightly so, wait for the responsible task for t1 to finish its work. That means, we have to embed the whole handling into a separate task, so we can wait for it.
The routine seems to basically create two tasks that run after each other and then tells the invoking code part to wait for it. So it seems not to alter the concurrency level. However, it actually increases it: If we wait for multiple input tasks, it issues these pairs of tasks per unfulfilled in-dependency.
Despite this observation, it remains clear that the eager invocation yields a higher concurrency level. See handleFusableTask_EagerConsumerTasks() and ensureHeldBackTasksAreMappedOntoNativeTasks_EagerConsumerTasks() for details.
A flaw of the present implementation is that additional
We need this one, as we don't know if there are still instances of LogReadyTask out there pending. If we knew that this is not the case, we could skip this step and eliminate one synchronisation point.
Definition at line 305 of file taskfusion.cpp.
References anyTasksOfInDendenciesMightBeStillReadyOrRunning(), assertion1, tarch::multicore::taskfusion::FusableTasksQueue::LockQueueAndReturnReliableResult, logDebug, logTraceInWith1Argument, logTraceOutWith1Argument, tarch::multicore::NoInDependencies, tarch::multicore::spawnTask(), and toString().
Referenced by ensureHeldBackTasksAreMappedOntoNativeTasks().


| void tarch::multicore::taskfusion::handleFusableTask | ( | Task * | task, |
| std::set< TaskNumber > | inDependencies, | ||
| const TaskNumber & | taskNumber ) |
Handle one fusable task.
This operation handles one fusable task in the sense that it first ensures that all infrastructure for the fusion is in place and that we can, in principle, fuse. After that, it delegates to implementation selected by the orchestration:
The routine is invoked by tarch::multicore::spawnTask() if this routine is given a task that can, in theory, be fused. Below are some remarks on particular strategies:
This variant from tarch::multicore::orchestration::FusionImplementation implies that we directly call the task's run() operation. However, we first have to ensure that we wait for all incoming tasks. We do so through a call to waitForTasks(). This wait in return is valid as we have previously called taskfusion::ensureHeldBackTasksAreMappedOntoNativeTasks().
Almost trival realisation of the core functionality through handleFusableTask_LazyEvaluation(). The interesting stuff, i.e. the fusion and task processing, all happens in ensureHeldBackTasksAreMappedOntoNativeTasks() which in turn calls ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation().
Definition at line 48 of file taskfusion.cpp.
References assertion, assertion1, tarch::multicore::orchestration::Strategy::FuseInstruction::device, tarch::multicore::orchestration::DisableFusionAndMapOntoNativeTask, tarch::multicore::orchestration::DisableFusionAndProcessImmediately, tarch::multicore::orchestration::EagerConsumerTasks, ensureHeldBackTasksAreMappedOntoNativeTasks(), tarch::multicore::MultiReadSingleWriteLock::free(), tarch::multicore::orchestration::Strategy::fuse(), tarch::logging::Statistics::getInstance(), tarch::multicore::getOrchestration(), tarch::multicore::Task::getTaskType(), handleFusableTask_EagerConsumerTasks(), handleFusableTask_LazyEvaluation(), handleFusableTask_ProducerConsumerTaskSequence(), tarch::multicore::Task::Host, tarch::logging::Statistics::inc(), tarch::multicore::orchestration::LazyEvaluation, logDebug, tarch::multicore::orchestration::Strategy::FuseInstruction::maxTasks, tarch::multicore::orchestration::Strategy::FuseInstruction::minTasks, tarch::multicore::orchestration::ProducerConsumerTaskSequence, tarch::multicore::Task::run(), tarch::multicore::orchestration::TimedExecution, tarch::multicore::orchestration::Strategy::FuseInstruction::toString(), tarch::multicore::waitForTasks(), and tarch::multicore::MultiReadSingleWriteLock::Write.
Referenced by tarch::multicore::spawnTask().


| void tarch::multicore::taskfusion::handleFusableTask_EagerConsumerTasks | ( | Task * | task, |
| FusableTasksQueue & | taskQueue, | ||
| std::set< TaskNumber > | inDependencies, | ||
| const TaskNumber & | taskNumber, | ||
| const tarch::multicore::orchestration::Strategy::FuseInstruction & | fuseInstruction ) |
Handle a task in the eager consumer pattern.
This routine has to be used in combination with handleFusableTask_LazyEvaluation(). It does, different as the name suggests, not enqueue any task. Instead, it simply evaluates if the task queue is ready for a fusion task. If so, it spawns one.
Definition at line 145 of file taskfusion.cpp.
References tarch::multicore::taskfusion::FusableTasksQueue::getNumberOfReadyTasks(), tarch::multicore::orchestration::Strategy::FuseInstruction::minTasks, tarch::multicore::NoInDependencies, and tarch::multicore::spawnTask().
Referenced by handleFusableTask().


| void tarch::multicore::taskfusion::handleFusableTask_LazyEvaluation | ( | Task * | task, |
| FusableTasksQueue & | taskQueue, | ||
| std::set< TaskNumber > | inDependencies, | ||
| const TaskNumber & | taskNumber, | ||
| const tarch::multicore::orchestration::Strategy::FuseInstruction & | fuseInstruction ) |
Implementation of handleFusableTask() if we use LazyEvaluation.
Lazy evaluation means that we take a task, but we wrap it out for a proxy task. This is an instance of LogReadyTask. The run() of the LogReadyTask instance dumps the actual task into the respective task queue. If there are no incoming dependencies, we can directly dump the actual task.
After that, we can return. If someone waits for our task (which we replaced with the LogReadyTask), we have to ensure that the task has been removed from the queue and processed. This is realised through ensureHeldBackTasksAreMappedOntoNativeTasks_LazyEvaluation().
Definition at line 120 of file taskfusion.cpp.
References tarch::multicore::Task::getPriority(), tarch::multicore::taskfusion::FusableTasksQueue::insertReadyTask(), logDebug, logTraceInWith1Argument, logTraceOutWith1Argument, tarch::multicore::NoOutDependencies, and tarch::multicore::Task::setPriority().
Referenced by handleFusableTask().


| void tarch::multicore::taskfusion::handleFusableTask_ProducerConsumerTaskSequence | ( | Task * | task, |
| FusableTasksQueue & | taskQueue, | ||
| std::set< TaskNumber > | inDependencies, | ||
| const TaskNumber & | taskNumber, | ||
| const tarch::multicore::orchestration::Strategy::FuseInstruction & | fuseInstruction ) |
Realisation for handleFusableTask() for ProducerConsumerTaskSequence.
Translate a single task into a sequence of (fusable tasks)
This routine makes a decision if to spawn a task as native task or to treat it as a fused task. The routine therefore contains all the fusion logic. If we fuse, it first ensure that there's a queue for this task type, and then creates the two helper tasks that will actually handle the queue insertion and the task progression.
The code first of all locks all the tasks through pendingTasksSemaphore and then ensures that there is a task queue for this particular task type. After that, the pointer taskQueue is guaranteed to point to the correct queue.
We cannot split the whole scheduling into two parts, as we do not not have a dependency number. So we insert the tasks straightforward into the out queue. After that, we issue a postprocessing task to handle the fused tasks. This will have no outgoing dependencies either.
We know that we have out-dependencies (see previous case distinction), so we can directly insert the task into the queue and then spawn a ProcessReadyTask which handles this tasks and models the out-dependency.
If the new fusable task has in-dependencies, we have to create a new instance of LogReadyTask and spawn it. We then immediately create a ProcessReadyTask instance with a dependency.
If the max number of tasks that should be fused is smaller one, the user has, by definition switched off all fusion. We therefore immediately spawn the task as a native one.
The whole idea of the fusion is that multiple tasks are bundled into one. If there are too many instances of ProcessReadyTask, then they will all grab from the fusion queue and there will never be enough tasks. So it makes sense to serialise the ProcessReadyTask instances if there are less than two times the max tasks in the queue.
Definition at line 163 of file taskfusion.cpp.
References tarch::multicore::Task::getPriority(), tarch::multicore::taskfusion::FusableTasksQueue::insertReadyTask(), logDebug, tarch::multicore::orchestration::Strategy::FuseInstruction::maxTasks, tarch::multicore::orchestration::Strategy::FuseInstruction::minTasks, tarch::multicore::NoInDependencies, tarch::multicore::NoOutDependencies, tarch::multicore::Task::setPriority(), and tarch::multicore::spawnTask().
Referenced by handleFusableTask().


| void tarch::multicore::taskfusion::processAllReadyTasks | ( | ) |
Run over all known queues and invoke drainReadyTasks() on each one.
Once we know that all fusion queues are drained, we know that they are either complete or mapped successfully onto native tasks. That is, we can invoke a task barrier.
This routine is used by the global synchronisation, i.e. by tarch::multicore::waitForAllTasks(), but we also have to invoke it before we yield: The whole idea behind a yield() is that we give the runtime the opportunity to bring in some other tasks, i.e. to disrupt the calculation. However, the runtime cannot bring in other tasks if the tasks are hidden away in a queue.
This routine invokes tarch::multicore::taskfusion::FusableTasksQueue::drainReadyTasks(). That is, this routine does not(!) issue any task fusion.
Definition at line 209 of file taskfusion.cpp.
References tarch::multicore::MultiReadSingleWriteLock::Read.
Referenced by tarch::multicore::waitForAllTasks(), and tarch::multicore::Core::yield().

| void tarch::multicore::taskfusion::processAndFuseAllReadyTasks | ( | ) |
Drain the queues but use task fusion where possible.
This is the cousin to processAllReadyTasks(). It ensures that the task queues are drained eventually, but it relies on task fusion rather than a manual drain. Different to processAllReadyTasks(), you do not have to issue a manual barrier afterward this routine has returned. You may assume that all tasks are literally complete.
This blocking behaviour implies that you stop your program execution. However, nothing stops you, subject to the correct dependency management, to deploy the call to processAndFuseAllReadyTasks() to yet another task of its own.
The routine can be used to set off timed execution as discussed in tarch::multicore::orchestration::FusionImplementation.
Definition at line 217 of file taskfusion.cpp.
References tarch::multicore::MultiReadSingleWriteLock::free(), tarch::multicore::NoOutDependencies, tarch::multicore::MultiReadSingleWriteLock::Read, and tarch::multicore::waitForTask().
