Peano
Loading...
Searching...
No Matches
tarch::multicore::orchestration::Hardcoded Class Reference

A hard coded strategy that can realise a few standard tasking patterns. More...

#include <Hardcoded.h>

Inheritance diagram for tarch::multicore::orchestration::Hardcoded:
Collaboration diagram for tarch::multicore::orchestration::Hardcoded:

Public Member Functions

 Hardcoded (int numberOfTasksToHoldBack, int minTasksToFuse, int maxTasksToFuse, int deviceForFusedTasks, bool fuseTasksImmediatelyWhenSpawned, int maxNestedConcurrency)
 Construct hardcoded scheme.
 
virtual ~Hardcoded ()=default
 
virtual void startBSPSection (int nestedParallelismLevel) override
 Notifies the strategy that we enter a BSP section.
 
virtual void endBSPSection (int nestedParallelismLevel) override
 Notifies the strategy that we leave a BSP (fork-join) section.
 
virtual FuseInstruction fuse (int taskType) override
 How many tasks shall system hold back from tasking runtime in user-defined queues.
 
virtual ExecutionPolicy paralleliseForkJoinSection (int nestedParallelismLevel, int numberOfTasks, int taskType) override
 Determine how to handle/realise parallelisation within fork/join region.
 
- Public Member Functions inherited from tarch::multicore::orchestration::Strategy
virtual ~Strategy ()=default
 

Static Public Member Functions

static HardcodedcreateBSP ()
 If you want to use sole BSP, you effectively switch off the tasking.
 
static HardcodedcreateNative ()
 Fall back to native tasking.
 
static HardcodedcreateBackfill ()
 Backfill strategy from the IWOMP paper.
 
static HardcodedcreateFuseAll (int numberOfTasksToFuse, bool fuseImmediately, bool processTasksWhileWaitingInBSPArea, int targetDevice)
 Create a strategy where tasks are always fused if possible given the configuration constraints.
 

Private Attributes

int _numberOfTasksToHoldBack
 
int _minTasksToFuse
 
int _maxTasksToFuse
 
int _deviceForFusedTasks
 
bool _fuseTasksImmediatelyWhenSpawned
 
int _maxNestedConcurrency
 

Additional Inherited Members

- Public Types inherited from tarch::multicore::orchestration::Strategy
enum class  ExecutionPolicy { RunSerially , RunParallel }
 Provide hint of execution policy. More...
 
- Static Public Attributes inherited from tarch::multicore::orchestration::Strategy
static constexpr int EndOfBSPSection = -1
 

Detailed Description

A hard coded strategy that can realise a few standard tasking patterns.

Definition at line 21 of file Hardcoded.h.

Constructor & Destructor Documentation

◆ Hardcoded()

tarch::multicore::orchestration::Hardcoded::Hardcoded ( int numberOfTasksToHoldBack,
int minTasksToFuse,
int maxTasksToFuse,
int deviceForFusedTasks,
bool fuseTasksImmediatelyWhenSpawned,
int maxNestedConcurrency )

Construct hardcoded scheme.

Todo
Docu

Definition at line 56 of file Hardcoded.cpp.

Referenced by createBSP().

Here is the caller graph for this function:

◆ ~Hardcoded()

virtual tarch::multicore::orchestration::Hardcoded::~Hardcoded ( )
virtualdefault

Member Function Documentation

◆ createBackfill()

tarch::multicore::orchestration::Hardcoded * tarch::multicore::orchestration::Hardcoded::createBackfill ( )
static

Backfill strategy from the IWOMP paper.

Definition at line 29 of file Hardcoded.cpp.

References tarch::multicore::Task::Host.

◆ createBSP()

tarch::multicore::orchestration::Hardcoded * tarch::multicore::orchestration::Hardcoded::createBSP ( )
static

If you want to use sole BSP, you effectively switch off the tasking.

Technically, this is realised by a strategy which enqueues all tasks that are spawned into the pending task queue. No tasks are handed over to the actual back-end. Therefore, the tasks will be done lazily upon demand within processPendingTasks().

Definition at line 7 of file Hardcoded.cpp.

References Hardcoded(), and tarch::multicore::Task::Host.

Here is the call graph for this function:

◆ createFuseAll()

tarch::multicore::orchestration::Hardcoded * tarch::multicore::orchestration::Hardcoded::createFuseAll ( int numberOfTasksToFuse,
bool fuseImmediately,
bool processTasksWhileWaitingInBSPArea,
int targetDevice )
static

Create a strategy where tasks are always fused if possible given the configuration constraints.

Parameters
numberOfTasksToFuseThe remaining tasks(<numberOfTasksToFuse) will remain stuck in the background queue and will stay there until processed lazily.
fuseImmediatelyFuse right when they are spawned. Otherwise, tasks end up in a local queue. If a thread runs out of work, it looks into this queue and then fuses. So the fuse happens later, but it does not hold back any task production thread.
targetDeviceNon-negative number or tarch::multicore::Task::Host.

Definition at line 40 of file Hardcoded.cpp.

◆ createNative()

◆ endBSPSection()

void tarch::multicore::orchestration::Hardcoded::endBSPSection ( int nestedParallelismLevel)
overridevirtual

Notifies the strategy that we leave a BSP (fork-join) section.

Implements tarch::multicore::orchestration::Strategy.

Definition at line 73 of file Hardcoded.cpp.

◆ fuse()

tarch::multicore::orchestration::Hardcoded::FuseInstruction tarch::multicore::orchestration::Hardcoded::fuse ( int taskType)
overridevirtual

How many tasks shall system hold back from tasking runtime in user-defined queues.

Todo
Docu

Tell the runtime system how many tasks to hold back: If there are more tasks than the result, the tasking system will map them onto native tasks. As long as we have fewer tasks than this number, the runtime system will store tasks in its internal queue and not pass them on. Holding tasks back gives us the opportunity to fuse tasks, and it reduces pressure from the underlying task system. It also is an implicit priorisation, i.e. tasks that we hold back are ready, but as we do not pass them on to the tasking runtime, they implicitly have ultra-low priority.

My data suggest that it is a very delicate decision to hold back tasks, as you run risk all the time that you starve threads even though work would be available. I recommend to hold back tasks - in line with the text above - iff

  • your tasking runtime struggles to handle many jobs. This happens with OpenMP for example as you have implicit scheudling points, i.e. OpenMP is allowed to process spawned tasks immediately. GNU does this one the number of tasks exceeds a certain threshold. You don't want this in Peano, so you can hold tasks back.
  • you want to fuse tasks and offload them en block to the GPU. This is a reasonable motivation as kernel launches are extremely expensive. In this case, hold up to N tasks back if you fuse N tasks in one bash, but do not hold back any tasks if you are outside of a BSP section, as no new tasks will be created anymore and you want the tasks to be done.

Realisation

The routine is not const, as I want strategies give the opportunity to adopt decisions after each call.

Invocation pattern

This routine is called once per task spawned (to know if we maybe should immediately map it onto a native task), and then at each end of the BSP section. When it is called for a particular task, we pass in a proper task type. That is, the decision of the strategy may depend on the type of the task for which we ask. At the end of a BSP section, we pass in tarch::multicore::orchestration::Strategy::EndOfBSPSection instead of a particular task type.

tarch::multicore::spawnAndWait() is the routine which triggers the query for the end of a BSP section. If we have N tasks and N is bigger than the result of this outine, it will map tasks onto native tasks through internal::mapPendingTasksOntoNativeTasks().

tarch::multicore::spawnTask() is the routine which queries this routine for each and every task.

Parameters
taskTypeEither actual task type if we get a task or EndOfBSPSection if it is not asked for a particular task type or, well, at the end of a fork-join part. How many tasks to fuse and to which device to deploy

Return a triple modelled via a FuseInstruction object.

  • The first entry specifies which device to use to deploy the tasks to. You can also return tarch::multicore::Task::Host to indicate that this is a fused task that shall run on the host rather than a device.
  • The second entry is the minimal number of tasks to fuse. If there are less than these tasks in the queue, don't fuse.
  • The third entry is the maximum number of tasks to fuse. Never batch more than this count into one (meta) task.
Parameters
taskTypeEither actual task type if we get a task or EndOfBSPSection if it is not asked for a particular task type or, well, at the end of a fork-join part.

Implements tarch::multicore::orchestration::Strategy.

Definition at line 75 of file Hardcoded.cpp.

◆ paralleliseForkJoinSection()

tarch::multicore::orchestration::Strategy::ExecutionPolicy tarch::multicore::orchestration::Hardcoded::paralleliseForkJoinSection ( int nestedParallelismLevel,
int numberOfTasks,
int taskType )
overridevirtual

Determine how to handle/realise parallelisation within fork/join region.

Peano models its execution with multiple parallel, nested fork/join sections. You could also think of these as mini-BSP sections. This routine guides the orchestration how to map those BSP sections onto tasks.

The decision can be guided by basically arbitrary contextual factors. The most important one for me is the nesting factor. As we work mainly with OpenMP, where tasks are tied to one core, it makes limited sense to have nested parallel fors. Notably, it makes stuff slower. So usually, I return ExecutionPolicy::RunSerially with anything with a nesting level greater than 1.

Parameters
nestedParallelismLevelPlease compare with tarch::multicore::spawnAndWait() which ensures that this flag equals 1 on the top level. A parameter of 0 would mean that no fork/join region has been opened. For such a parameter, the code would not query this function.
taskTypeIf we enter a fork-join section, this section logically spawns a set of tasks, which are all of the same type. So the task type here is given implicitly by the code location. But each BSP section has a unique identifier.

Implements tarch::multicore::orchestration::Strategy.

Definition at line 81 of file Hardcoded.cpp.

References tarch::multicore::orchestration::Strategy::RunParallel, and tarch::multicore::orchestration::Strategy::RunSerially.

◆ startBSPSection()

void tarch::multicore::orchestration::Hardcoded::startBSPSection ( int nestedParallelismLevel)
overridevirtual

Notifies the strategy that we enter a BSP section.

Implements tarch::multicore::orchestration::Strategy.

Definition at line 71 of file Hardcoded.cpp.

Field Documentation

◆ _deviceForFusedTasks

int tarch::multicore::orchestration::Hardcoded::_deviceForFusedTasks
private

Definition at line 26 of file Hardcoded.h.

◆ _fuseTasksImmediatelyWhenSpawned

bool tarch::multicore::orchestration::Hardcoded::_fuseTasksImmediatelyWhenSpawned
private

Definition at line 27 of file Hardcoded.h.

◆ _maxNestedConcurrency

int tarch::multicore::orchestration::Hardcoded::_maxNestedConcurrency
private

Definition at line 28 of file Hardcoded.h.

◆ _maxTasksToFuse

int tarch::multicore::orchestration::Hardcoded::_maxTasksToFuse
private

Definition at line 25 of file Hardcoded.h.

◆ _minTasksToFuse

int tarch::multicore::orchestration::Hardcoded::_minTasksToFuse
private

Definition at line 24 of file Hardcoded.h.

◆ _numberOfTasksToHoldBack

int tarch::multicore::orchestration::Hardcoded::_numberOfTasksToHoldBack
private

Definition at line 23 of file Hardcoded.h.


The documentation for this class was generated from the following files: