Peano 4

Typedefs  
typedef std::bitset< Dimensions >  LoopDirection 
Is used by the zloop.  
Enumerations  
enum class  LoopPlacement { Serial , Nested , SpreadOut } 
Guide looplevel parallelism. More...  
Functions  
CPUGPUMethod void  dInc (tarch::la::Vector< Dimensions, int > &counter, int max) 
ddimensional counterpart of increment operator  
void  dDec (tarch::la::Vector< Dimensions, int > &counter, int max) 
ddimensional counterpart of decrement operator  
void  dInc (tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) 
ddimensional counterpart of increment, where individual vector components have different max values  
void  dIncByVector (tarch::la::Vector< Dimensions, int > &counter, int max, int increment) 
Perform a ddimensional increment by value increment: The first component of the counter is incremented by increment.  
void  dIncByScalar (tarch::la::Vector< Dimensions, int > &counter, int max, int increment) 
Perform a scalar increment of a vector: The operation equals a sequence of increment calls to dInc().  
void  dInc (tarch::la::Vector< Dimensions, int > &counter, int max, int doNotExamine) 
Same operation as dInc(tarch::la::Vector<Dimensions,int>,int), but now one dimension is not taken into consideration.  
void  dInc (tarch::la::Vector< Dimensions, int > &counter, int max, LoopDirection &direction) 
Operation similar to dInc, but is given a direction bitset that identifies whether the counters has to be incremented or decremented.  
CPUGPUMethod int  dCmp (const tarch::la::Vector< Dimensions, int > &counter, int max) 
Elementwise comparison for the for ddimensional for loops.  
int  dCmp (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) 
Elementwise comparison for the for ddimensional for loops.  
bool  dCmpLinearOrder (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) 
Compares two vectors with regards to their linearised value.  
CPUGPUMethod int  dLinearised (const tarch::la::Vector< Dimensions, int > &counter, int max) 
Map ddimensional vector onto integer index.  
CPUGPUMethod int  dLinearised (const std::bitset< Dimensions > &counter) 
int  d2Linearised (const tarch::la::Vector< 2, int > &counter, int max) 
Special 2d variant of dLinearised that works also if you compile with other dimensions.  
int  d3Linearised (const tarch::la::Vector< 3, int > &counter, int max) 
Special 3d variant of dLinearised that works also if you compile with other dimensions.  
int  dLinearisedWithoutLookup (const tarch::la::Vector< Dimensions, int > &counter, int max) 
Linearisation not Optimised.  
tarch::la::Vector< Dimensions, int >  dDelinearised (int value, int max) 
Counterpart of dLinearised().  
tarch::la::Vector< Dimensions, int >  dDelinearisedWithoutLookup (int value, int max) 
Delinearization not optimised.  
void  setupLookupTableForDLinearised () 
void  setupLookupTableForDDelinearised () 
CPUGPUMethod tarch::la::Vector< Dimensions, int >  dStartVector () 
Construct start vector (0,0,....) for ddimensional loop.  
tarch::la::Vector< Dimensions, int >  dStartVector (int dim, int value) 
tarch::la::Vector< Dimensions, int >  dStartVector (int max, const LoopDirection &direction) 
Creates a start vector.  
typedef std::bitset<Dimensions> peano4::utils::LoopDirection 

strong 
Guide looplevel parallelism.
Peano's loop macros allow users to define the logical concurrency of loops: Is it serial (with dependencies), can it run in parallel (though there might be dependencies/atomics/critical sections) or is a loop of SIMT/SIMD type, i.e. without any dependencies.
Further to that, codes can guide the placement of the loops:
Value  Semantics for parallel for  Semantics for simt loop 

Serial  Keep it on one core.  Keep it on one GPU thread 
  or don't use AVX. Nested  If an outer parallel loop  Stick to one SM or one  grabs a core or n cores,  core's AVX units.  stick to these n cores.  SpreadOut  Try to grab additional  Try to use multiple SMs.  cores outside of an   enclosing parallel region. 
The flags help performance engineers to balance between overheads and the maximum concurrency that's made available to a parallel or SIMT loop.
Enumerator  

Serial  
Nested  
SpreadOut 
int peano4::utils::d2Linearised  (  const tarch::la::Vector< 2, int > &  counter, 
int  max ) 
Special 2d variant of dLinearised that works also if you compile with other dimensions.
Definition at line 116 of file Loop.cpp.
References assertion2, and assertion3.
Referenced by toolbox::finiteelements::BSplinesStencilFactory::getElementWiseAssemblyMatrix().
int peano4::utils::d3Linearised  (  const tarch::la::Vector< 3, int > &  counter, 
int  max ) 
Special 3d variant of dLinearised that works also if you compile with other dimensions.
Definition at line 129 of file Loop.cpp.
References assertion2.
Referenced by toolbox::finiteelements::BSplinesStencilFactory::getElementWiseAssemblyMatrix().
int peano4::utils::dCmp  (  const tarch::la::Vector< Dimensions, int > &  counter, 
const tarch::la::Vector< Dimensions, int > &  max ) 
Elementwise comparison for the for ddimensional for loops.
Elementwise comparison for the loops. Different to the other dCmp() operation, this one works fine even if the max range per vector entry is different.
Definition at line 281 of file Loop.cpp.
References assertion2.
int peano4::utils::dCmp  (  const tarch::la::Vector< Dimensions, int > &  counter, 
int  max ) 
Elementwise comparison for the for ddimensional for loops.
Consult dLinearised() for a discussion of GPU aspect. Further to the discussion therein, we have to disable assertions within a SYCL context.
Definition at line 292 of file Loop.cpp.
References assertion2.
bool peano4::utils::dCmpLinearOrder  (  const tarch::la::Vector< Dimensions, int > &  counter, 
const tarch::la::Vector< Dimensions, int > &  max ) 
void peano4::utils::dDec  (  tarch::la::Vector< Dimensions, int > &  counter, 
int  max ) 
ddimensional counterpart of decrement operator
This operation performs a ddimensional decrement on a given integer vector: The first component of the vector is decremented. If the first component is smaller than 0, the component is set to max and the next component is decremented by one.
Definition at line 169 of file Loop.cpp.
References assertion.
tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearised  (  int  value, 
int  max ) 
Counterpart of dLinearised().
This operation's semantics equals dDeLinearised, but the operation is not optimised at all. It thus allows to have arbitrary argument values. Yet, this version is not optimised, i.e. it might become a bottleneck.
Definition at line 146 of file Loop.cpp.
References assertionEquals2.
Referenced by toolbox::finiteelements::mapElementMatrixEntryOntoStencilEntry().
tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearisedWithoutLookup  (  int  value, 
int  max ) 
void peano4::utils::dInc  (  tarch::la::Vector< Dimensions, int > &  counter, 
const tarch::la::Vector< Dimensions, int > &  max ) 
ddimensional counterpart of increment, where individual vector components have different max values
This operation performs a ddimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max(0)1, the component is set zero and the next component is incremented by one. This operation is used often by ddimensional forloops.
void peano4::utils::dInc  (  tarch::la::Vector< Dimensions, int > &  counter, 
int  max ) 
ddimensional counterpart of increment operator
This operation performs a ddimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max1, the component is set zero and the next component is incremented by one. This operation is used often by ddimensional forloops.
void peano4::utils::dInc  (  tarch::la::Vector< Dimensions, int > &  counter, 
int  max,  
int  doNotExamine ) 
void peano4::utils::dInc  (  tarch::la::Vector< Dimensions, int > &  counter, 
int  max,  
LoopDirection &  direction ) 
void peano4::utils::dIncByScalar  (  tarch::la::Vector< Dimensions, int > &  counter, 
int  max,  
int  increment ) 
void peano4::utils::dIncByVector  (  tarch::la::Vector< Dimensions, int > &  counter, 
int  max,  
int  increment ) 
Perform a ddimensional increment by value increment: The first component of the counter is incremented by increment.
Afterwards, the operation checks the first entry: If it exceeds max, its module value is set, the next component is incremented by increment, and the check continues.
int peano4::utils::dLinearised  (  const std::bitset< Dimensions > &  counter  ) 
int peano4::utils::dLinearised  (  const tarch::la::Vector< Dimensions, int > &  counter, 
int  max ) 
Map ddimensional vector onto integer index.
This operation is called pretty often and, thus, might cause a significant slowdown in the overall performance. Therefore, I introduced a aggressive optimization based on lookup tables. This optimization is switched on if DLOOP_AGGRESSIVE is specified (default in Peano project). Two preconditions have to be fulfilled in this case: All parameters have to stay within certain boundaries (all positive, max smaller or equal to 5) and one has to call both setupLookupTableForDLinearised() and setupLookupTableForDDelinearised() before using dLinearised() or dDelinearised().
Obviously, creating a lookup table for these two operations is not that simple, since the parameter space has to be mapped onto a unique key. To end up with a simple mapping, all the constraints from above are added. Although the mapping might be slow, it is still faster than computing the partial sums of a to the power of b.
The dlinear loops are used mainly on the host, but I provide a SYCL implementation of dlinear, too. SYCL does not really distinguish the host queue from the accelerator queue at the moment. I therefore have to mark the dLinearised() routine as GPU offloadable. See parallelDfor for a discussion of when this is required.
Once annotated, we might still get errors if we use SYCL on the host and have no GPU offloading (via SYCL) enabled: The macro GPUCallableMethod in this case is not defined. Actually, it is explicitly defined as empty, so I cannot even overwrite it. Therefore, I introduce the new macro
CPUGPUMethod
It delegates to SYCL_EXTERNAL, once SYCL is enabled on the CPU. Consult SYCL support as well for further SYCL details.
Definition at line 106 of file Loop.cpp.
References assertionEquals3.
Referenced by peano4::datamanagement::VertexMarker::areAdjacentCellsLocal(), exahype2::saintvenant::calculateDerivatives(), applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::calculateDerivatives(), toolbox::blockstructured::computeGradient(), toolbox::blockstructured::computeGradientAndReturnMaxDifference(), exahype2::dg::internal::copySolution(), peano4::grid::GridTraversalEventGenerator::createGenericCellTraversalEvent(), toolbox::blockstructured::internal::createPiecewiseConstantInterpolationMatrix(), peano4::grid::Spacetree::descend(), exahype2::dg::evaluatePolynomial(), toolbox::finiteelements::extractElementStencil(), exahype2::saintvenant::extrapolateHalo(), applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::extrapolateHalo(), toolbox::blockstructured::extrapolatePatchSolutionAndProjectExtrapolatedHaloOntoFaces(), peano4::grid::GridTraversalEventGenerator::getAdjacentRanksOfFace(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), peano4::parallel::Node::getOutputStackForPeriodicBoundaryExchange(), peano4::parallel::Node::getPeriodicBoundaryNumber(), swift2::getVertexNumbersOfParentVertices(), peano4::grid::GridTraversalEventGenerator::getVertexType(), toolbox::blockstructured::interpolateCellDataAssociatedToVolumesIntoOverlappingCell_fourthOrder(), toolbox::blockstructured::interpolateCellDataAssociatedToVolumesIntoOverlappingCell_linear(), peano4::grid::Spacetree::loadVertices(), exahype2::fv::plotPatch(), toolbox::finiteelements::preprocessBoundaryStencil(), toolbox::blockstructured::internal::projectInterpolatedFineCellsOnHaloLayer_AoS(), toolbox::blockstructured::projectPatchHaloOntoFaces(), toolbox::blockstructured::projectPatchSolutionOntoFaces(), exahype2::fv::internal::projectValueOntoParticle_piecewiseConstant(), exahype2::fv::internal::projectValueOntoParticle_piecewiseLinear(), peano4::grid::Spacetree::refineState(), toolbox::blockstructured::restrictCell_AoS_averaging(), toolbox::blockstructured::restrictCell_AoS_inject(), toolbox::blockstructured::restrictCellIntoOverlappingCell_inject(), toolbox::blockstructured::restrictCellIntoOverlappingCell_inject_and_average(), exahype2::dg::tests::CellIntegralTest::runEulerOrder2OnStationarySetup(), peano4::grid::Spacetree::shouldEraseAdjacencyInformation(), peano4::grid::Spacetree::storeVertices(), exahype2::fv::rusanov::tests::CopyPatchTest::testCopyPatch(), peano4::grid::Spacetree::traverse(), exahype2::fv::validatePatch(), exahype2::aderdg::validatePatch(), and exahype2::aderdg::validateSpacetimePatch().
int peano4::utils::dLinearisedWithoutLookup  (  const tarch::la::Vector< Dimensions, int > &  counter, 
int  max ) 
tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector  (  ) 
Construct start vector (0,0,....) for ddimensional loop.
tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector  (  int  dim, 
int  value ) 
Definition at line 322 of file Loop.cpp.
References assertion2.
tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector  (  int  max, 
const LoopDirection &  direction ) 
Creates a start vector.
Each component is set either 0 or max1 depending on direction: If direction is true, then the value 0 is zero.
Definition at line 332 of file Loop.cpp.
References assertion2.
void peano4::utils::setupLookupTableForDDelinearised  (  ) 
Definition at line 79 of file Loop.cpp.
Referenced by peano4::fillLookupTables().
void peano4::utils::setupLookupTableForDLinearised  (  ) 
Definition at line 68 of file Loop.cpp.
Referenced by peano4::fillLookupTables().