|
Peano
|
Namespaces | |
| namespace | tests |
Typedefs | |
| typedef std::bitset< Dimensions > | LoopDirection |
| Is used by the z-loop. | |
Enumerations | |
| enum class | LoopPlacement { Serial , Nested , SpreadOut } |
| Guide loop-level parallelism. More... | |
Functions | |
| CPUGPUMethod void | dInc (tarch::la::Vector< Dimensions, int > &counter, int max) |
| d-dimensional counterpart of increment operator | |
| void | dDec (tarch::la::Vector< Dimensions, int > &counter, int max) |
| d-dimensional counterpart of decrement operator | |
| void | dInc (tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) |
| d-dimensional counterpart of increment, where individual vector components have different max values | |
| void | dIncByVector (tarch::la::Vector< Dimensions, int > &counter, int max, int increment) |
| Perform a d-dimensional increment by value increment: The first component of the counter is incremented by increment. | |
| void | dIncByScalar (tarch::la::Vector< Dimensions, int > &counter, int max, int increment) |
| Perform a scalar increment of a vector: The operation equals a sequence of increment calls to dInc(). | |
| void | dInc (tarch::la::Vector< Dimensions, int > &counter, int max, int doNotExamine) |
| Same operation as dInc(tarch::la::Vector<Dimensions,int>,int), but now one dimension is not taken into consideration. | |
| void | dInc (tarch::la::Vector< Dimensions, int > &counter, int max, LoopDirection &direction) |
| Operation similar to dInc, but is given a direction bitset that identifies whether the counters has to be incremented or decremented. | |
| CPUGPUMethod int | dCmp (const tarch::la::Vector< Dimensions, int > &counter, int max) |
| Element-wise comparison for the for d-dimensional for loops. | |
| int | dCmp (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) |
| Element-wise comparison for the for d-dimensional for loops. | |
| bool | dCmpLinearOrder (const tarch::la::Vector< Dimensions, int > &counter, const tarch::la::Vector< Dimensions, int > &max) |
| Compares two vectors with regards to their linearised value. | |
| CPUGPUMethod int | dLinearised (const tarch::la::Vector< Dimensions, int > &counter, int max) |
| Map d-dimensional vector onto integer index. | |
| CPUGPUMethod int | dLinearised (const std::bitset< Dimensions > &counter) |
| int | d2Linearised (const tarch::la::Vector< 2, int > &counter, int max) |
| Special 2d variant of dLinearised that works also if you compile with other dimensions. | |
| int | d3Linearised (const tarch::la::Vector< 3, int > &counter, int max) |
| Special 3d variant of dLinearised that works also if you compile with other dimensions. | |
| int | dLinearisedWithoutLookup (const tarch::la::Vector< Dimensions, int > &counter, int max) |
| Linearisation not Optimised. | |
| tarch::la::Vector< Dimensions, int > | dDelinearised (int value, int max) |
| Counterpart of dLinearised(). | |
| tarch::la::Vector< Dimensions, int > | dDelinearisedWithoutLookup (int value, int max) |
| Delinearization not optimised. | |
| void | setupLookupTableForDLinearised () |
| void | setupLookupTableForDDelinearised () |
| CPUGPUMethod tarch::la::Vector< Dimensions, int > | dStartVector () |
| Construct start vector (0,0,....) for d-dimensional loop. | |
| tarch::la::Vector< Dimensions, int > | dStartVector (int dim, int value) |
| tarch::la::Vector< Dimensions, int > | dStartVector (int max, const LoopDirection &direction) |
| Creates a start vector. | |
| typedef std::bitset<Dimensions> peano4::utils::LoopDirection |
|
strong |
Guide loop-level parallelism.
Peano's loop macros allow users to define the logical concurrency of loops: Is it serial (with dependencies), can it run in parallel (though there might be dependencies/atomics/critical sections) or is a loop of SIMT/SIMD type, i.e. without any dependencies.
Further to that, codes can guide the placement of the loops:
| Value | Semantics for parallel for | Semantics for simt loop |
|---|---|---|
| Serial | Keep it on one core. | Keep it on one GPU thread |
| | or don't use AVX. Nested | If an outer parallel loop | Stick to one SM or one | grabs a core or n cores, | core's AVX units. | stick to these n cores. | SpreadOut | Try to grab additional | Try to use multiple SMs. | cores outside of an | | enclosing parallel region. |
The flags help performance engineers to balance between overheads and the maximum concurrency that's made available to a parallel or SIMT loop.
| Enumerator | |
|---|---|
| Serial | |
| Nested | |
| SpreadOut | |
| int peano4::utils::d2Linearised | ( | const tarch::la::Vector< 2, int > & | counter, |
| int | max ) |
Special 2d variant of dLinearised that works also if you compile with other dimensions.
| int peano4::utils::d3Linearised | ( | const tarch::la::Vector< 3, int > & | counter, |
| int | max ) |
Special 3d variant of dLinearised that works also if you compile with other dimensions.
| int peano4::utils::dCmp | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
| const tarch::la::Vector< Dimensions, int > & | max ) |
Element-wise comparison for the for d-dimensional for loops.
Element-wise comparison for the loops. Different to the other dCmp() operation, this one works fine even if the max range per vector entry is different.
| CPUGPUMethod int peano4::utils::dCmp | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
| int | max ) |
Element-wise comparison for the for d-dimensional for loops.
Consult dLinearised() for a discussion of GPU aspect. Further to the discussion therein, we have to disable assertions within a SYCL context.
| bool peano4::utils::dCmpLinearOrder | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
| const tarch::la::Vector< Dimensions, int > & | max ) |
Compares two vectors with regards to their linearised value.
| void peano4::utils::dDec | ( | tarch::la::Vector< Dimensions, int > & | counter, |
| int | max ) |
d-dimensional counterpart of decrement operator
This operation performs a d-dimensional decrement on a given integer vector: The first component of the vector is decremented. If the first component is smaller than 0, the component is set to max and the next component is decremented by one.
| tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearised | ( | int | value, |
| int | max ) |
Counterpart of dLinearised().
This operation's semantics equals dDeLinearised, but the operation is not optimised at all. It thus allows to have arbitrary argument values. Yet, this version is not optimised, i.e. it might become a bottleneck.
| tarch::la::Vector< Dimensions, int > peano4::utils::dDelinearisedWithoutLookup | ( | int | value, |
| int | max ) |
Delinearization not optimised.
| void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
| const tarch::la::Vector< Dimensions, int > & | max ) |
d-dimensional counterpart of increment, where individual vector components have different max values
This operation performs a d-dimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max(0)-1, the component is set zero and the next component is incremented by one. This operation is used often by d-dimensional for-loops.
| CPUGPUMethod void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
| int | max ) |
d-dimensional counterpart of increment operator
This operation performs a d-dimensional increment on a given integer vector: The first component of the vector is incremented. If the first component is greater than max-1, the component is set zero and the next component is incremented by one. This operation is used often by d-dimensional for-loops.
| void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
| int | max, | ||
| int | doNotExamine ) |
Same operation as dInc(tarch::la::Vector<Dimensions,int>,int), but now one dimension is not taken into consideration.
| void peano4::utils::dInc | ( | tarch::la::Vector< Dimensions, int > & | counter, |
| int | max, | ||
| LoopDirection & | direction ) |
Operation similar to dInc, but is given a direction bitset that identifies whether the counters has to be incremented or decremented.
See the dforz macro for an example how to use dInc.
| void peano4::utils::dIncByScalar | ( | tarch::la::Vector< Dimensions, int > & | counter, |
| int | max, | ||
| int | increment ) |
Perform a scalar increment of a vector: The operation equals a sequence of increment calls to dInc().
| void peano4::utils::dIncByVector | ( | tarch::la::Vector< Dimensions, int > & | counter, |
| int | max, | ||
| int | increment ) |
Perform a d-dimensional increment by value increment: The first component of the counter is incremented by increment.
Afterwards, the operation checks the first entry: If it exceeds max, its module value is set, the next component is incremented by increment, and the check continues.
| CPUGPUMethod int peano4::utils::dLinearised | ( | const std::bitset< Dimensions > & | counter | ) |
| CPUGPUMethod int peano4::utils::dLinearised | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
| int | max ) |
Map d-dimensional vector onto integer index.
This operation is called pretty often and, thus, might cause a significant slowdown in the overall performance. Therefore, I introduced a aggressive optimization based on lookup tables. This optimization is switched on if DLOOP_AGGRESSIVE is specified (default in Peano project). Two preconditions have to be fulfilled in this case: All parameters have to stay within certain boundaries (all positive, max smaller or equal to 5) and one has to call both setupLookupTableForDLinearised() and setupLookupTableForDDelinearised() before using dLinearised() or dDelinearised().
Obviously, creating a lookup table for these two operations is not that simple, since the parameter space has to be mapped onto a unique key. To end up with a simple mapping, all the constraints from above are added. Although the mapping might be slow, it is still faster than computing the partial sums of a to the power of b.
The d-linear loops are used mainly on the host, but I provide a SYCL implementation of d-linear, too. SYCL does not really distinguish the host queue from the accelerator queue at the moment. I therefore have to mark the dLinearised() routine as GPU offloadable. See parallelDfor for a discussion of when this is required.
Once annotated, we might still get errors if we use SYCL on the host and have no GPU offloading (via SYCL) enabled: The macro GPUCallableMethod in this case is not defined. Actually, it is explicitly defined as empty, so I cannot even overwrite it. Therefore, I introduce the new macro
CPUGPUMethod
It delegates to SYCL_EXTERNAL, once SYCL is enabled on the CPU. Consult SYCL support as well for further SYCL details.
Referenced by applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::calculateDerivatives(), and applications::exahype2::CompressibleNavierStokes::NavierStokesSolver::extrapolateHalo().

| int peano4::utils::dLinearisedWithoutLookup | ( | const tarch::la::Vector< Dimensions, int > & | counter, |
| int | max ) |
Linearisation not Optimised.
This operation's semantics equals dLinearised, but the operation is not optimised at all. It thus allows to have arbitrary argument values. Yet, this version is not optimised, i.e. it might become a bottleneck.
| CPUGPUMethod tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector | ( | ) |
Construct start vector (0,0,....) for d-dimensional loop.
| tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector | ( | int | dim, |
| int | value ) |
| tarch::la::Vector< Dimensions, int > peano4::utils::dStartVector | ( | int | max, |
| const LoopDirection & | direction ) |
Creates a start vector.
Each component is set either 0 or max-1 depending on direction: If direction is true, then the value 0 is zero.
| void peano4::utils::setupLookupTableForDDelinearised | ( | ) |
| void peano4::utils::setupLookupTableForDLinearised | ( | ) |