Peano
|
Namespaces | |
namespace | tests |
Typedefs | |
typedef std::bitset< Dimensions > | LoopDirection |
Is used by the z-loop. | |
Enumerations | |
enum class | LoopPlacement { Serial , Nested , SpreadOut } |
Guide loop-level parallelism. More... | |
Functions | |
CPUGPUMethod void | dInc (tarch::la::Vector< Dimensions, int > &counter, int max) |
d-dimensional counterpart of increment operator | |
void | dDec (tarch::la::Vector< Dimensions, int > &counter, int max) |
d-dimensional counterpart of decrement operator | |
void | dInc (tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) |
d-dimensional counterpart of increment, where individual vector components have different max values | |
void | dIncByVector (tarch::la::Vector< Dimensions, int > &counter, int max, int increment) |
Perform a d-dimensional increment by value increment: The first component of the counter is incremented by increment. | |
void | dIncByScalar (tarch::la::Vector< Dimensions, int > &counter, int max, int increment) |
Perform a scalar increment of a vector: The operation equals a sequence of increment calls to dInc(). | |
void | dInc (tarch::la::Vector< Dimensions, int > &counter, int max, int doNotExamine) |
Same operation as dInc(tarch::la::Vector<Dimensions,int>,int), but now one dimension is not taken into consideration. | |
void | dInc (tarch::la::Vector< Dimensions, int > &counter, int max, LoopDirection &direction) |
Operation similar to dInc, but is given a direction bitset that identifies whether the counters has to be incremented or decremented. | |
CPUGPUMethod int | dCmp (const tarch::la::Vector< Dimensions, int > &counter, int max) |
Element-wise comparison for the for d-dimensional for loops. | |
int | dCmp (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) |
Element-wise comparison for the for d-dimensional for loops. | |
bool | dCmpLinearOrder (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) |
Compares two vectors with regards to their linearised value. | |
CPUGPUMethod int | dLinearised (const tarch::la::Vector< Dimensions, int > &counter, int max) |
Map d-dimensional vector onto integer index. | |
CPUGPUMethod int | dLinearised (const std::bitset< Dimensions > &counter) |
int | d2Linearised (const tarch::la::Vector< 2, int > &counter, int max) |
Special 2d variant of dLinearised that works also if you compile with other dimensions. | |
int | d3Linearised (const tarch::la::Vector< 3, int > &counter, int max) |
Special 3d variant of dLinearised that works also if you compile with other dimensions. | |
int | dLinearisedWithoutLookup (const tarch::la::Vector< Dimensions, int > &counter, int max) |
Linearisation not Optimised. | |
tarch::la::Vector< Dimensions, int > | dDelinearised (int value, int max) |
Counterpart of dLinearised(). | |
tarch::la::Vector< Dimensions, int > | dDelinearisedWithoutLookup (int value, int max) |
Delinearization not optimised. | |
void | setupLookupTableForDLinearised () |
void | setupLookupTableForDDelinearised () |
CPUGPUMethod tarch::la::Vector< Dimensions, int > | dStartVector () |
Construct start vector (0,0,....) for d-dimensional loop. | |
tarch::la::Vector< Dimensions, int > | dStartVector (int dim, int value) |
tarch::la::Vector< Dimensions, int > | dStartVector (int max, const LoopDirection &direction) |
Creates a start vector. | |
typedef std::bitset<Dimensions> peano4::utils::LoopDirection |
|
strong |
Guide loop-level parallelism.
Peano's loop macros allow users to define the logical concurrency of loops: Is it serial (with dependencies), can it run in parallel (though there might be dependencies/atomics/critical sections) or is a loop of SIMT/SIMD type, i.e. without any dependencies.
Further to that, codes can guide the placement of the loops:
Value | Semantics for parallel for | Semantics for simt loop |
---|---|---|
Serial | Keep it on one core. | Keep it on one GPU thread |
| | or don't use AVX. Nested | If an outer parallel loop | Stick to one SM or one | grabs a core or n cores, | core's AVX units. | stick to these n cores. | SpreadOut | Try to grab additional | Try to use multiple SMs. | cores outside of an | | enclosing parallel region. |
The flags help performance engineers to balance between overheads and the maximum concurrency that's made available to a parallel or SIMT loop.
Enumerator | |
---|---|
Serial | |
Nested | |
SpreadOut |
int peano4::utils::d2Linearised | ( | const tarch::la::Vector< 2, int > & | counter, |
int | max ) |
Special 2d variant of dLinearised that works also if you compile with other dimensions.
Definition at line 116 of file Loop.cpp.
References assertion2, and assertion3.
Referenced by toolbox::finiteelements::BSplinesStencilFactory::getElementWiseAssemblyMatrix().
int peano4::utils::d3Linearised | ( | const tarch::la::Vector< 3, int > & | counter, |
int | max ) |
Special 3d variant of dLinearised that works also if you compile with other dimensions.
Definition at line 129 of file Loop.cpp.
References assertion2.
Referenced by toolbox::finiteelements::BSplinesStencilFactory::getElementWiseAssemblyMatrix().
int peano4::utils::dCmp | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
const tarch::la::Vector< Dimensions, int > & | max ) |
Element-wise comparison for the for d-dimensional for loops.
Element-wise comparison for the loops. Different to the other dCmp() operation, this one works fine even if the max range per vector entry is different.
Definition at line 281 of file Loop.cpp.
References assertion2.
int peano4::utils::dCmp | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
int | max ) |
Element-wise comparison for the for d-dimensional for loops.
Consult dLinearised() for a discussion of GPU aspect. Further to the discussion therein, we have to disable assertions within a SYCL context.
Definition at line 292 of file Loop.cpp.
References assertion2.
bool peano4::utils::dCmpLinearOrder | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
const tarch::la::Vector< Dimensions, int > & | max ) |
void peano4::utils::dDec | ( | tarch::la::Vector< Dimensions, int > & | counter, |
int | max ) |
d-dimensional counterpart of decrement operator
This operation performs a d-dimensional decrement on a given integer vector: The first component of the vector is decremented. If the first component is smaller than 0, the component is set to max and the next component is decremented by one.
Definition at line 169 of file Loop.cpp.
References assertion.
tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearised | ( | int | value, |
int | max ) |
Counterpart of dLinearised().
This operation's semantics equals dDeLinearised, but the operation is not optimised at all. It thus allows to have arbitrary argument values. Yet, this version is not optimised, i.e. it might become a bottleneck.
Definition at line 146 of file Loop.cpp.
References assertionEquals2.
Referenced by toolbox::finiteelements::mapElementMatrixEntryOntoStencilEntry().
tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearisedWithoutLookup | ( | int | value, |
int | max ) |
void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
const tarch::la::Vector< Dimensions, int > & | max ) |
d-dimensional counterpart of increment, where individual vector components have different max values
This operation performs a d-dimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max(0)-1, the component is set zero and the next component is incremented by one. This operation is used often by d-dimensional for-loops.
void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
int | max ) |
d-dimensional counterpart of increment operator
This operation performs a d-dimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max-1, the component is set zero and the next component is incremented by one. This operation is used often by d-dimensional for-loops.
void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
int | max, | ||
int | doNotExamine ) |
void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
int | max, | ||
LoopDirection & | direction ) |
void peano4::utils::dIncByScalar | ( | tarch::la::Vector< Dimensions, int > & | counter, |
int | max, | ||
int | increment ) |
void peano4::utils::dIncByVector | ( | tarch::la::Vector< Dimensions, int > & | counter, |
int | max, | ||
int | increment ) |
Perform a d-dimensional increment by value increment: The first component of the counter is incremented by increment.
Afterwards, the operation checks the first entry: If it exceeds max, its module value is set, the next component is incremented by increment, and the check continues.
int peano4::utils::dLinearised | ( | const std::bitset< Dimensions > & | counter | ) |
int peano4::utils::dLinearised | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
int | max ) |
Map d-dimensional vector onto integer index.
This operation is called pretty often and, thus, might cause a significant slowdown in the overall performance. Therefore, I introduced a aggressive optimization based on lookup tables. This optimization is switched on if DLOOP_AGGRESSIVE is specified (default in Peano project). Two preconditions have to be fulfilled in this case: All parameters have to stay within certain boundaries (all positive, max smaller or equal to 5) and one has to call both setupLookupTableForDLinearised() and setupLookupTableForDDelinearised() before using dLinearised() or dDelinearised().
Obviously, creating a lookup table for these two operations is not that simple, since the parameter space has to be mapped onto a unique key. To end up with a simple mapping, all the constraints from above are added. Although the mapping might be slow, it is still faster than computing the partial sums of a to the power of b.
The d-linear loops are used mainly on the host, but I provide a SYCL implementation of d-linear, too. SYCL does not really distinguish the host queue from the accelerator queue at the moment. I therefore have to mark the dLinearised() routine as GPU offloadable. See parallelDfor for a discussion of when this is required.
Once annotated, we might still get errors if we use SYCL on the host and have no GPU offloading (via SYCL) enabled: The macro GPUCallableMethod in this case is not defined. Actually, it is explicitly defined as empty, so I cannot even overwrite it. Therefore, I introduce the new macro
CPUGPUMethod
It delegates to SYCL_EXTERNAL, once SYCL is enabled on the CPU. Consult SYCL support as well for further SYCL details.
Definition at line 106 of file Loop.cpp.
References assertionEquals3.
Referenced by exahype2::fd::tests::CCZ4KernelTest::AppleWithAppleTest(), peano4::datamanagement::VertexMarker::areAdjacentCellsLocal(), applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::calculateDerivatives(), toolbox::blockstructured::computeGradient(), toolbox::blockstructured::computeGradientAndReturnMaxDifference(), exahype2::dg::internal::copySolution(), peano4::grid::GridTraversalEventGenerator::createGenericCellTraversalEvent(), toolbox::blockstructured::internal::createPiecewiseConstantInterpolationMatrix(), peano4::grid::Spacetree::descend(), exahype2::dg::evaluatePolynomial(), toolbox::finiteelements::extractElementStencil(), applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::extrapolateHalo(), toolbox::blockstructured::extrapolatePatchSolutionAndProjectExtrapolatedHaloOntoFaces(), peano4::grid::GridTraversalEventGenerator::getAdjacentRanksOfFace(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), toolbox::finiteelements::getElementWiseAssemblyMatrix(), peano4::parallel::Node::getOutputStackForPeriodicBoundaryExchange(), peano4::parallel::Node::getPeriodicBoundaryNumber(), swift2::getVertexNumbersOfParentVertices(), peano4::grid::GridTraversalEventGenerator::getVertexType(), toolbox::blockstructured::interpolateCellDataAssociatedToVolumesIntoOverlappingCell_fourthOrder(), toolbox::blockstructured::interpolateCellDataAssociatedToVolumesIntoOverlappingCell_linear(), toolbox::blockstructured::interpolateCellDataAssociatedToVolumesIntoOverlappingCell_secondOrder(), peano4::grid::Spacetree::loadVertices(), exahype2::fv::plotPatch(), toolbox::finiteelements::preprocessBoundaryStencil(), toolbox::blockstructured::internal::projectInterpolatedFineCellsOnHaloLayer_AoS(), toolbox::blockstructured::projectPatchHaloOntoFaces(), toolbox::blockstructured::projectPatchSolutionOntoFaces(), exahype2::fv::internal::projectValueOntoParticle_piecewiseConstant(), exahype2::fv::internal::projectValueOntoParticle_piecewiseLinear(), mghype::matrixfree::solvers::cgmultigrid::prolongate(), peano4::grid::Spacetree::refineState(), toolbox::blockstructured::restrictCell_AoS_averaging(), toolbox::blockstructured::restrictCell_AoS_inject(), toolbox::blockstructured::restrictCellIntoOverlappingCell_inject(), toolbox::blockstructured::restrictCellIntoOverlappingCell_inject_and_average(), mghype::matrixfree::solvers::cgmultigrid::restrictToNextLevel(), exahype2::dg::tests::CellIntegralTest::runEulerOrder2OnStationarySetup(), peano4::grid::Spacetree::shouldEraseAdjacencyInformation(), peano4::grid::Spacetree::storeVertices(), exahype2::fv::rusanov::tests::CopyPatchTest::testCopyPatch(), peano4::utils::tests::ParallelDForTest::testParallelDFor(), peano4::grid::Spacetree::traverse(), and exahype2::fv::validatePatch().
int peano4::utils::dLinearisedWithoutLookup | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
int | max ) |
tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector | ( | ) |
Construct start vector (0,0,....) for d-dimensional loop.
tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector | ( | int | dim, |
int | value ) |
Definition at line 322 of file Loop.cpp.
References assertion2.
tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector | ( | int | max, |
const LoopDirection & | direction ) |
Creates a start vector.
Each component is set either 0 or max-1 depending on direction: If direction is true, then the value 0 is zero.
Definition at line 332 of file Loop.cpp.
References assertion2.
void peano4::utils::setupLookupTableForDDelinearised | ( | ) |
Definition at line 79 of file Loop.cpp.
Referenced by peano4::fillLookupTables().
void peano4::utils::setupLookupTableForDLinearised | ( | ) |
Definition at line 68 of file Loop.cpp.
References dfor.
Referenced by peano4::fillLookupTables().