Peano 4
Loading...
Searching...
No Matches
peano4::utils Namespace Reference

Typedefs

typedef std::bitset< Dimensions > LoopDirection
 Is used by the z-loop.
 

Enumerations

enum class  LoopPlacement { Serial , Nested , SpreadOut }
 Guide loop-level parallelism. More...
 

Functions

CPUGPUMethod void dInc (tarch::la::Vector< Dimensions, int > &counter, int max)
 d-dimensional counterpart of increment operator
 
void dDec (tarch::la::Vector< Dimensions, int > &counter, int max)
 d-dimensional counterpart of decrement operator
 
void dInc (tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max)
 d-dimensional counterpart of increment, where individual vector components have different max values
 
void dIncByVector (tarch::la::Vector< Dimensions, int > &counter, int max, int increment)
 Perform a d-dimensional increment by value increment: The first component of the counter is incremented by increment.
 
void dIncByScalar (tarch::la::Vector< Dimensions, int > &counter, int max, int increment)
 Perform a scalar increment of a vector: The operation equals a sequence of increment calls to dInc().
 
void dInc (tarch::la::Vector< Dimensions, int > &counter, int max, int doNotExamine)
 Same operation as dInc(tarch::la::Vector<Dimensions,int>,int), but now one dimension is not taken into consideration.
 
void dInc (tarch::la::Vector< Dimensions, int > &counter, int max, LoopDirection &direction)
 Operation similar to dInc, but is given a direction bitset that identifies whether the counters has to be incremented or decremented.
 
CPUGPUMethod int dCmp (const tarch::la::Vector< Dimensions, int > &counter, int max)
 Element-wise comparison for the for d-dimensional for loops.
 
int dCmp (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max)
 Element-wise comparison for the for d-dimensional for loops.
 
bool dCmpLinearOrder (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max)
 Compares two vectors with regards to their linearised value.
 
CPUGPUMethod int dLinearised (const tarch::la::Vector< Dimensions, int > &counter, int max)
 Map d-dimensional vector onto integer index.
 
CPUGPUMethod int dLinearised (const std::bitset< Dimensions > &counter)
 
int d2Linearised (const tarch::la::Vector< 2, int > &counter, int max)
 Special 2d variant of dLinearised that works also if you compile with other dimensions.
 
int d3Linearised (const tarch::la::Vector< 3, int > &counter, int max)
 Special 3d variant of dLinearised that works also if you compile with other dimensions.
 
int dLinearisedWithoutLookup (const tarch::la::Vector< Dimensions, int > &counter, int max)
 Linearisation not Optimised.
 
tarch::la::Vector< Dimensions, intdDelinearised (int value, int max)
 Counterpart of dLinearised().
 
tarch::la::Vector< Dimensions, intdDelinearisedWithoutLookup (int value, int max)
 Delinearization not optimised.
 
void setupLookupTableForDLinearised ()
 
void setupLookupTableForDDelinearised ()
 
CPUGPUMethod tarch::la::Vector< Dimensions, intdStartVector ()
 Construct start vector (0,0,....) for d-dimensional loop.
 
tarch::la::Vector< Dimensions, intdStartVector (int dim, int value)
 
tarch::la::Vector< Dimensions, intdStartVector (int max, const LoopDirection &direction)
 Creates a start vector.
 

Typedef Documentation

◆ LoopDirection

typedef std::bitset<Dimensions> peano4::utils::LoopDirection

Is used by the z-loop.

See macro dforz.

Definition at line 65 of file Loop.h.

Enumeration Type Documentation

◆ LoopPlacement

enum class peano4::utils::LoopPlacement
strong

Guide loop-level parallelism.

Peano's loop macros allow users to define the logical concurrency of loops: Is it serial (with dependencies), can it run in parallel (though there might be dependencies/atomics/critical sections) or is a loop of SIMT/SIMD type, i.e. without any dependencies.

Further to that, codes can guide the placement of the loops:

Value Semantics for parallel for Semantics for simt loop
Serial Keep it on one core. Keep it on one GPU thread

| | or don't use AVX. Nested | If an outer parallel loop | Stick to one SM or one | grabs a core or n cores, | core's AVX units. | stick to these n cores. | SpreadOut | Try to grab additional | Try to use multiple SMs. | cores outside of an | | enclosing parallel region. |

The flags help performance engineers to balance between overheads and the maximum concurrency that's made available to a parallel or SIMT loop.

Enumerator
Serial 
Nested 
SpreadOut 

Definition at line 60 of file Loop.h.

Function Documentation

◆ d2Linearised()

int peano4::utils::d2Linearised ( const tarch::la::Vector< 2, int > & counter,
int max )

Special 2d variant of dLinearised that works also if you compile with other dimensions.

Definition at line 116 of file Loop.cpp.

References assertion2, and assertion3.

Referenced by toolbox::finiteelements::BSplinesStencilFactory::getElementWiseAssemblyMatrix().

Here is the caller graph for this function:

◆ d3Linearised()

int peano4::utils::d3Linearised ( const tarch::la::Vector< 3, int > & counter,
int max )

Special 3d variant of dLinearised that works also if you compile with other dimensions.

Definition at line 129 of file Loop.cpp.

References assertion2.

Referenced by toolbox::finiteelements::BSplinesStencilFactory::getElementWiseAssemblyMatrix().

Here is the caller graph for this function:

◆ dCmp() [1/2]

int peano4::utils::dCmp ( const tarch::la::Vector< Dimensions, int > & counter,
const tarch::la::Vector< Dimensions, int > & max )

Element-wise comparison for the for d-dimensional for loops.

Element-wise comparison for the loops. Different to the other dCmp() operation, this one works fine even if the max range per vector entry is different.

Returns
true if all entries of counter are smaller than their corresponding entries in max

Definition at line 281 of file Loop.cpp.

References assertion2.

◆ dCmp() [2/2]

int peano4::utils::dCmp ( const tarch::la::Vector< Dimensions, int > & counter,
int max )

Element-wise comparison for the for d-dimensional for loops.

Consult dLinearised() for a discussion of GPU aspect. Further to the discussion therein, we have to disable assertions within a SYCL context.

Returns
true if all entries of counter are smaller max

Definition at line 292 of file Loop.cpp.

References assertion2.

◆ dCmpLinearOrder()

bool peano4::utils::dCmpLinearOrder ( const tarch::la::Vector< Dimensions, int > & counter,
const tarch::la::Vector< Dimensions, int > & max )

Compares two vectors with regards to their linearised value.

Returns
true, if dLinearised(counter, XXX) < dLinearised(max, XXX)

Definition at line 305 of file Loop.cpp.

◆ dDec()

void peano4::utils::dDec ( tarch::la::Vector< Dimensions, int > & counter,
int max )

d-dimensional counterpart of decrement operator

This operation performs a d-dimensional decrement on a given integer vector: The first component of the vector is decremented. If the first component is smaller than 0, the component is set to max and the next component is decremented by one.

Definition at line 169 of file Loop.cpp.

References assertion.

◆ dDelinearised()

tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearised ( int value,
int max )

Counterpart of dLinearised().

This operation's semantics equals dDeLinearised, but the operation is not optimised at all. It thus allows to have arbitrary argument values. Yet, this version is not optimised, i.e. it might become a bottleneck.

Definition at line 146 of file Loop.cpp.

References assertionEquals2.

Referenced by toolbox::finiteelements::mapElementMatrixEntryOntoStencilEntry().

Here is the caller graph for this function:

◆ dDelinearisedWithoutLookup()

tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearisedWithoutLookup ( int value,
int max )

Delinearization not optimised.

Definition at line 142 of file Loop.cpp.

◆ dInc() [1/4]

void peano4::utils::dInc ( tarch::la::Vector< Dimensions, int > & counter,
const tarch::la::Vector< Dimensions, int > & max )

d-dimensional counterpart of increment, where individual vector components have different max values

This operation performs a d-dimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max(0)-1, the component is set zero and the next component is incremented by one. This operation is used often by d-dimensional for-loops.

Definition at line 184 of file Loop.cpp.

◆ dInc() [2/4]

void peano4::utils::dInc ( tarch::la::Vector< Dimensions, int > & counter,
int max )

d-dimensional counterpart of increment operator

This operation performs a d-dimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max-1, the component is set zero and the next component is incremented by one. This operation is used often by d-dimensional for-loops.

See also
dLinearised() for a discussion of GPU annotation

Definition at line 158 of file Loop.cpp.

◆ dInc() [3/4]

void peano4::utils::dInc ( tarch::la::Vector< Dimensions, int > & counter,
int max,
int doNotExamine )

Same operation as dInc(tarch::la::Vector<Dimensions,int>,int), but now one dimension is not taken into consideration.

Definition at line 220 of file Loop.cpp.

References assertion.

◆ dInc() [4/4]

void peano4::utils::dInc ( tarch::la::Vector< Dimensions, int > & counter,
int max,
LoopDirection & direction )

Operation similar to dInc, but is given a direction bitset that identifies whether the counters has to be incremented or decremented.

See the dforz macro for an example how to use dInc.

Definition at line 249 of file Loop.cpp.

◆ dIncByScalar()

void peano4::utils::dIncByScalar ( tarch::la::Vector< Dimensions, int > & counter,
int max,
int increment )

Perform a scalar increment of a vector: The operation equals a sequence of increment calls to dInc().

Definition at line 209 of file Loop.cpp.

◆ dIncByVector()

void peano4::utils::dIncByVector ( tarch::la::Vector< Dimensions, int > & counter,
int max,
int increment )

Perform a d-dimensional increment by value increment: The first component of the counter is incremented by increment.

Afterwards, the operation checks the first entry: If it exceeds max, its module value is set, the next component is incremented by increment, and the check continues.

Definition at line 196 of file Loop.cpp.

◆ dLinearised() [1/2]

int peano4::utils::dLinearised ( const std::bitset< Dimensions > & counter)

Definition at line 95 of file Loop.cpp.

◆ dLinearised() [2/2]

int peano4::utils::dLinearised ( const tarch::la::Vector< Dimensions, int > & counter,
int max )

Map d-dimensional vector onto integer index.

This operation is called pretty often and, thus, might cause a significant slowdown in the overall performance. Therefore, I introduced a aggressive optimization based on lookup tables. This optimization is switched on if DLOOP_AGGRESSIVE is specified (default in Peano project). Two preconditions have to be fulfilled in this case: All parameters have to stay within certain boundaries (all positive, max smaller or equal to 5) and one has to call both setupLookupTableForDLinearised() and setupLookupTableForDDelinearised() before using dLinearised() or dDelinearised().

Obviously, creating a lookup table for these two operations is not that simple, since the parameter space has to be mapped onto a unique key. To end up with a simple mapping, all the constraints from above are added. Although the mapping might be slow, it is still faster than computing the partial sums of a to the power of b.

GPU programming

The d-linear loops are used mainly on the host, but I provide a SYCL implementation of d-linear, too. SYCL does not really distinguish the host queue from the accelerator queue at the moment. I therefore have to mark the dLinearised() routine as GPU offloadable. See parallelDfor for a discussion of when this is required.

Once annotated, we might still get errors if we use SYCL on the host and have no GPU offloading (via SYCL) enabled: The macro GPUCallableMethod in this case is not defined. Actually, it is explicitly defined as empty, so I cannot even overwrite it. Therefore, I introduce the new macro

     CPUGPUMethod

It delegates to SYCL_EXTERNAL, once SYCL is enabled on the CPU. Consult SYCL support as well for further SYCL details.

Returns
the linearisation of the counter, i.e. the k-th component is multiplied by max^k and the results are accumulated.

Definition at line 106 of file Loop.cpp.

References assertionEquals3.

Referenced by peano4::datamanagement::VertexMarker::areAdjacentCellsLocal(), exahype2::saintvenant::calculateDerivatives(), applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::calculateDerivatives(), toolbox::blockstructured::computeGradient(), toolbox::blockstructured::computeGradientAndReturnMaxDifference(), exahype2::dg::internal::copySolution(), peano4::grid::GridTraversalEventGenerator::createGenericCellTraversalEvent(), toolbox::blockstructured::internal::createPiecewiseConstantInterpolationMatrix(), peano4::grid::Spacetree::descend(), exahype2::dg::evaluatePolynomial(), toolbox::finiteelements::extractElementStencil(), exahype2::saintvenant::extrapolateHalo(), applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::extrapolateHalo(), toolbox::blockstructured::extrapolatePatchSolutionAndProjectExtrapolatedHaloOntoFaces(), peano4::grid::GridTraversalEventGenerator::getAdjacentRanksOfFace(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), peano4::parallel::Node::getOutputStackForPeriodicBoundaryExchange(), peano4::parallel::Node::getPeriodicBoundaryNumber(), swift2::getVertexNumbersOfParentVertices(), peano4::grid::GridTraversalEventGenerator::getVertexType(), toolbox::blockstructured::interpolateCellDataAssociatedToVolumesIntoOverlappingCell_fourthOrder(), toolbox::blockstructured::interpolateCellDataAssociatedToVolumesIntoOverlappingCell_linear(), peano4::grid::Spacetree::loadVertices(), exahype2::fv::plotPatch(), toolbox::finiteelements::preprocessBoundaryStencil(), toolbox::blockstructured::internal::projectInterpolatedFineCellsOnHaloLayer_AoS(), toolbox::blockstructured::projectPatchHaloOntoFaces(), toolbox::blockstructured::projectPatchSolutionOntoFaces(), exahype2::fv::internal::projectValueOntoParticle_piecewiseConstant(), exahype2::fv::internal::projectValueOntoParticle_piecewiseLinear(), peano4::grid::Spacetree::refineState(), toolbox::blockstructured::restrictCell_AoS_averaging(), toolbox::blockstructured::restrictCell_AoS_inject(), toolbox::blockstructured::restrictCellIntoOverlappingCell_inject(), toolbox::blockstructured::restrictCellIntoOverlappingCell_inject_and_average(), exahype2::dg::tests::CellIntegralTest::runEulerOrder2OnStationarySetup(), peano4::grid::Spacetree::shouldEraseAdjacencyInformation(), peano4::grid::Spacetree::storeVertices(), exahype2::fv::rusanov::tests::CopyPatchTest::testCopyPatch(), peano4::grid::Spacetree::traverse(), exahype2::fv::validatePatch(), exahype2::aderdg::validatePatch(), and exahype2::aderdg::validateSpacetimePatch().

Here is the caller graph for this function:

◆ dLinearisedWithoutLookup()

int peano4::utils::dLinearisedWithoutLookup ( const tarch::la::Vector< Dimensions, int > & counter,
int max )

Linearisation not Optimised.

This operation's semantics equals dLinearised, but the operation is not optimised at all. It thus allows to have arbitrary argument values. Yet, this version is not optimised, i.e. it might become a bottleneck.

Definition at line 90 of file Loop.cpp.

◆ dStartVector() [1/3]

tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector ( )

Construct start vector (0,0,....) for d-dimensional loop.

See also
dLinearised() for a discussion of GPU impact.
Returns
a vector containing zero values only.

Definition at line 316 of file Loop.cpp.

◆ dStartVector() [2/3]

tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector ( int dim,
int value )
Returns
a vector containing only zero values besides the dim-th entry. This entry is set value.

Definition at line 322 of file Loop.cpp.

References assertion2.

◆ dStartVector() [3/3]

tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector ( int max,
const LoopDirection & direction )

Creates a start vector.

Each component is set either 0 or max-1 depending on direction: If direction is true, then the value 0 is zero.

Returns
a start vector for an oscillating loop.

Definition at line 332 of file Loop.cpp.

References assertion2.

◆ setupLookupTableForDDelinearised()

void peano4::utils::setupLookupTableForDDelinearised ( )

Definition at line 79 of file Loop.cpp.

Referenced by peano4::fillLookupTables().

Here is the caller graph for this function:

◆ setupLookupTableForDLinearised()

void peano4::utils::setupLookupTableForDLinearised ( )

Definition at line 68 of file Loop.cpp.

References dfor, and k.

Referenced by peano4::fillLookupTables().

Here is the caller graph for this function: