15 Tree walker to realise particle-particle interactions with particles held by different adjacent cells or on different mesh levels
17 This code snippet creates a tree walker, i.e. an action set that runs
18 through the underlying spacetree top down and builds up
19 @ref page_toolbox_particles_mesh_traversal "the active set and the local set".
20 These sets can then be used to trigger mesh events.
21 It works if and only if you use continuous storage per vertex as introduced
22 with the @ref toolbox_particles_memorypool "memory pools".
24 Within touchCellFirstTime(), the routine exposes the following data
25 besides the generic information of any action set:
27 - ***marker*** The CellMarker is avaialable to a compute kernel and allows you
28 to query spatial information. This one is actually one of the generic objects
29 which every action set has available, but we highlight that it is there, as
30 the @ref page_toolbox_particles_mesh_traversal "discussion on uniqueness for particle-particle interactions"
31 explicitly refers to the marker.
32 - ***activeParticles*** The list of active particles is a list of pointers. It
33 follows the @ref page_toolbox_particles_mesh_storage "generic definition of active sets".
34 - ***localParticles*** This is a list of local particles, i.e. particles which
35 are tied to the cell's adjacent vertices. While they reside on the cell's
36 level, we have no guarantee that they reside within the current cell. They
37 might also be contained within its halo of marker.h()/2.
39 The additional field _activeParticles is an alias for activeParticles.
42 The code is slightly more sophisticated compared than its cousin, the
43 UpdateParticle_MultiLevelInteraction_Sets, as it exposes more information:
46 - ***numberOfActiveParticlesPerVertex*** This alias for
47 _numberOfActiveParticlesPerVertex is a sequence of integers
48 which tells you how many particles within activeParticles are added by one
50 - ***numberOfCoalescedLocalParticlesPerVertex*** is a sequence of integers which
51 tells you how many particles within localParticles are
52 added by each adjacent vertex.
54 Both pieces of information are important if you work with memory pools
59 ## Different particles species
61 Both active and local set are of the same type if you stick
62 to the default value of None for active_particle_set. But you can also
63 let particles of different type interact by selecting a local particle
64 set that is different to the active one. Users can inject code to modify
65 the local set per cell and hence to alter particle states.
68 ## Realisation and data consistency
70 This tree walker realises the two sets via two lists: Per cell, we
71 append the local set to the active set and memorise on a stack how
72 many of these particles have been added. When we backtrace, i.e.
73 ascend within the tree again, this memorised size is used to remove
74 the last n elements from the active set.
76 So the active set grows while we descend in the tree, and it is
77 truncated when we ascend again.
79 This is a very efficient implementation yet fails if particles move
80 up and down within the tree while we walk. It should work perfectly
81 fine if particles do not move. If particles move up and down, the
82 present algorithms might have redundant pointers within the active set.
83 It is therefore absolutely key that this action set is not used in
84 combination with any other action set which lifts data, as a lift
85 destroys contributions made by coarser mesh cells. If particles drop,
86 we also cannot use this action set, as, again, it might corrupt the
87 active sets constructed by coarser meshes. The issue with the lifts
88 notably becomes apparent once you take into account that lifts trigger
89 a scatter on gathered vertices. That is, we descend within the tree and
90 build up our continuous index spaces. Then, later, we decide that we
91 lift one particle. This lift means that particles associated with a
92 coarser vertex are scattered. However, our active set assumes that
93 we had a valid, continous memory for them. So all of our active sets
94 are now invalid. Sieves in contrast work, as long as their action
95 set is used before the current one. They merely add data to the
96 active set that's constructed next.
100 - Sieves are fine as long as they happen before touchVertexFirstTime()
102 - Lifts into a global sieve list are fine, as long as they happen in
103 touchVertexLastTime() called after this action set.
105 In the case of a dynamic setup, i.e. in combinatino with the action
106 set UpdateParticleGridAssociation, you might be better of with using
107 the set-based implementation of the particle-particle interaction.
111 ## Further plug-in points
113 Besides the cell kernel which is there to realise particle-to-particle
114 interactions, we also have a vertex kernel which we call whenever a
115 vertex is loaded for the first time. That is, you can assume that the
116 vertex kernel has been launched for all @f$ 2^d @f$ vertices of a cell before
117 its cell kernel is triggered. The vertex kernel is also passed on the
118 active set (from the coarser level). Its local set is all the particles
119 whose centre lies within the square/cube around the vertex with mesh size
120 h. So this area goes h/2 along each each coordinate axis into the
124 ## Flattening of particle arrays
126@todo Some of the syntax here might be outdated and recipes might be redundant. Compare general toolbox docu.
128 The routine not only keeps track of how many particles we add per iteration.
129 It also keeps track of the number of particles per vertex that have been
130 added. If you know that you work with a @ref page_toolbox_particles_mesh_storage "memory pool", i.e. a consecutive
131 storage of particles, you can use this additional meta information within
132 your kernel invocations to work with loops over consecutive arrays (AoS)
133 rather than sole link lists. You can exploit that you know something
134 about individual chunks indexed by the pointers.
136 To exploit this additional information, you cannot use localParticles anymore,
137 i.e. you have to use localParticles. This means, you
138 might have to check if a particle is contained within a cell manually.
139 Once you rewrite your kernels that way, you can exploit the integers
140 _numberOfActiveParticlesPerVertex and numberOfCoalescedLocalParticlesPerVertex:
142 @image html UpdateParticle_MultiLevelInteraction_StackOfLists_ContiguousParticles_continuous-memory.png
144 Assume you enter a cell and you get the localParticles
145 list hosting 19 pointers. Further to that, you receive the array
146 numberOfLocalParticlesPerVertex with four integer entries. You know that the
147 first five entries point to one continuous chunk of particles in memory, the
148 next seven do so as well, and so forth. This holds if and only if you work
149 with a memory pool which maintains vertex-local or globally continuous
152 I do recommend to call toolbox::particles::ensureThatParticleListsEncodeContinuousMemoryChunks()
153 for your data to ensure that the memory layout above is preserved. Simply
156 ~~~~~~~~~~~~~~~~~~~~~~~~~~
157 ::toolbox::particles::ensureThatParticleListsEncodeContinuousMemoryChunks(
159 _numberOfActiveParticlesPerVertex
161 ::toolbox::particles::ensureThatParticleListsEncodeContinuousMemoryChunks(
163 numberOfLocalParticlesPerVertex
165 ~~~~~~~~~~~~~~~~~~~~~~~~~~
167 to your cell-wise compute kernel. Furthermore, you might want to check if all
168 vertices are properly gathered:
170 ~~~~~~~~~~~~~~~~~~~~~~~~~~
171 ::toolbox::particles::ensureAllParticleListsAreGatheredOrEmpty(
172 fineGridVerticesMyParticleSetName
174 ~~~~~~~~~~~~~~~~~~~~~~~~~~
176 The actual compute kernels then typically look similar to the following code
180 ~~~~~~~~~~~~~~~~~~~~~~~~~~
181 typename std::list<T*>::const_iterator localParticlesIterator = localParticles.begin();
183 for (auto localParticlesChunkSize: numberOfLocalParticlesPerVertex) {
184 T* localParticlesChunk = *localParticlesIterator;
185 std::advance( localParticlesIterator, localParticlesChunkSize );
187 for (int localParticleNumberInThisChunk=0; localParticleNumberInThisChunk<localParticlesChunkSize; localParticleNumberInThisChunk++) {
189 marker.isContained( localParticlesChunk[localParticleNumberInThisChunk].getX() )
191 // ensure that particle is not updated twice
197 ~~~~~~~~~~~~~~~~~~~~~~~~~~
200 Iterating over the active set follows the same pattern. Most codes will
201 have to couple local and active sets and hence use a kernel structure
205 ~~~~~~~~~~~~~~~~~~~~~~~~~~
206 typename std::list< typename ParticleContainer::value_type >::const_iterator localParticlesIterator = localParticles.begin();
208 for (auto localParticlesChunkSize: numberOfLocalParticlesPerVertex) {
209 typename ParticleContainer::value_type localParticlesChunk = *localParticlesIterator;
210 std::advance( localParticlesIterator, localParticlesChunkSize );
212 typename std::list< typename ParticleContainer::value_type >::const_iterator activeParticlesIterator = activeParticles.begin();
213 for (auto activeParticlesChunkSize: numberOfActiveParticlesPerVertex) {
214 typename ParticleContainer::value_type activeParticlesChunk = *activeParticlesIterator;
215 std::advance( activeParticlesIterator, activeParticlesChunkSize );
220 localParticlesChunkSize,
221 activeParticlesChunk,
222 activeParticlesChunkSize
226 ~~~~~~~~~~~~~~~~~~~~~~~~~~
228 This inner loop is where the magic happens, i.e. where we can exploit the
229 fact that localParticlesChunk and activeParticlesChunk both point to coninuous
230 memory regions. This means that computeInnerLoop is structurally simple:
233 ~~~~~~~~~~~~~~~~~~~~~~~~~~
234 void computeHydroForceInnerLoop(
235 const peano4::datamanagement::CellMarker& marker,
236 Particle* particlesAssociatedWithLocalVertex, // das ist falsch
237 int numberOfLocalParticles,
238 Particle* activeParticles,
239 int numberOfActiveParticles
241 for (int activeParticleNumberInThisChunk=0; activeParticleNumberInThisChunk<numberOfActiveParticles; activeParticleNumberInThisChunk++) {
242 for (int localParticleNumberInThisChunk=0; localParticleNumberInThisChunk<numberOfLocalParticles; localParticleNumberInThisChunk++) {
243 // read from activeParticles[activeParticleNumberInThisChunk]
244 // write to particlesAssociatedWithLocalVertex[localParticleNumberInThisChunk]
248 ~~~~~~~~~~~~~~~~~~~~~~~~~~
251 ## Dynamic adaptive mesh refinement
253 In the second sweep of any adaptive mesh refinement, we might drop particles
254 and hence destroy our nicely arranged memory layout.
255 Once particles move up and down the mesh hierarchy, the entries in
256 _numberOfActiveParticlesPerVertex are all invalid.
257 In such a case, we have to switch off any coalesced memory access.
260 The @ref page_toolbox_particles_mesh_traversal "generic mesh traversal description" provides
261 further context and details.
262 The important detail here is:
263 As soon as we might suspect that the memory layout is corrupted, we have to
264 assume that our pointers are all ov the place and do not point to continuous
265 data anymore. Even worse, we know that previously built-up active sets might be
266 invalid. We have built them up top-down, but when we drop data out of the
267 active set, they change on-the-fly.
271 ## Optimising coalesced compute kernels
273 Compute kernels can be optimised quite aggressively, but fiddling out how they
274 can be optimised is tedious. I recommend to use MAQAO and Intel's compiler
275 optimisation feedback. The page @ref toolbox_particles_memorypool holds
276 additional information. It is also worth studying @ref page_swift_performance_optimisation "Swift's performance optimisation"
277 remarks with respect to vecorisation as well as Swift's discussion of @ref swift_particle_lifecycle Particle Lifecycle "vectorised kernels".
279 Compute kernels can be optimised quite aggressively, but fiddling out how they
280 can be optimised is tedious. I recommend to use MAQAO and Intel's compiler
281 optimisation feedback. Here are some lessons learned:
283 - You cannot collapse the two for loops. That would lead to a scattered
284 iteration space in memory and hence make any vectorisation impossible.
285 - The active loop has to be the inner one. I do not understand why, but
286 Intel's icpx refuses to vectorise if the local loop is the inner one.
287 - The compiler has to be able to inline. According to https://clang.llvm.org/docs/AttributeReference.html#always-inline-force-inline
288 this is something one should be able to control via the annotation
290 [[clang::always_inline]]
292 but this led to multiple compiler crashes (again with Intel). To some
293 degree this makes sense: The function has to be in the header! I played
294 around with ipo, but that did not resolve the fundamental issues.
295 - You might have to recursively study all inlined methods and make the
296 stuff they use inline as well.
297 - Typically, you have to embed the inner-most loop into a statement alike
298 ~~~~~~~~~~~~~~~~~~~~~~~
300 marker.isContained(particlesAssociatedWithLocalVertex[localParticleNumberInThisChunk].getX())
302 not particlesAssociatedWithLocalVertex[localParticleNumberInThisChunk].getCellHasUpdatedParticle()
304 // read from activeParticles[activeParticleNumberInThisChunk]
305 // write to particlesAssociatedWithLocalVertex[localParticleNumberInThisChunk]
307 ~~~~~~~~~~~~~~~~~~~~~~~
309 The @ref page_peano_performance_optimisation "generic Peano optimisation" page
310 provides some useful hints how to guide these optimisation through the
311 compiler. It also makes sense to study swift2::kernels::computeHydroForce()
312 which was one of the first routines that we optimised along these ideas.
319 particle_particle_interaction_kernel,
320 touch_vertex_first_time_compute_particle_update_kernel=None,
321 touch_vertex_last_time_compute_particle_update_kernel=None,
322 prepare_traversal_kernel="",
323 unprepare_traversal_kernel="",
324 additional_includes="",
325 active_particle_set=None,
333 particle_set: ParticleSet
335 particle_particle_interaction_kernel: String holding C++ code
337 active_particle_set: ParticleSet or None
338 You can compare different particles species with this argument. It
339 allows you to make the active particles stem from another species than
340 the local ones that you actually update. Pick None if both sets are of
343 touch_vertex_first_time_compute_particle_update_kernel: String or None
344 Can be empty, but if you wanna move particles, then a minimal example
347 for (auto* localParticle: localParticles ) {
348 localParticle->setMoveState( globaldata::Particle::MoveState::NotMovedYet );
351 i.e. you will have a variable localParticle available in this kernel
357 UpdateParticle_MultiLevelInteraction_StackOfLists_ContiguousParticles, self
358 ).
__init__(descend_invocation_order=1, parallel=
False)
362 self.
d[
"LOCAL_PARTICLE"] = particle_set.particle_model.name
363 self.
d[
"LOCAL_PARTICLES_CONTAINER"] = particle_set.name
364 if active_particle_set ==
None:
365 self.
d[
"ACTIVE_PARTICLE"] = particle_set.particle_model.name
366 self.
d[
"ACTIVE_PARTICLES_CONTAINER"] = particle_set.name
368 self.
d[
"ACTIVE_PARTICLE"] = active_particle_set.particle_model.name
369 self.
d[
"ACTIVE_PARTICLES_CONTAINER"] = active_particle_set.name
371 "PARTICLE_PARTICLE_INTERACTION_KERNEL"
372 ] = particle_particle_interaction_kernel
374 "TOUCH_VERTEX_FIRST_COMPUTE_KERNEL"
375 ] = touch_vertex_first_time_compute_particle_update_kernel
377 "TOUCH_VERTEX_LAST_COMPUTE_KERNEL"
378 ] = touch_vertex_last_time_compute_particle_update_kernel
379 self.
d[
"ADDITIONAL_INCLUDES"] = additional_includes
380 self.
d[
"PREPARE_TRAVERSAL_KERNEL"] = prepare_traversal_kernel
381 self.
d[
"UNPREPARE_TRAVERSAL_KERNEL"] = unprepare_traversal_kernel
383 __Template_TouchVertexFirstTime = jinja2.Template(
385 {% if TOUCH_VERTEX_FIRST_COMPUTE_KERNEL!=None %}
386 auto& assignedParticles = fineGridVertex{{LOCAL_PARTICLES_CONTAINER}};
387 int numberOfCoalescedAssignedParticles = fineGridVertex{{LOCAL_PARTICLES_CONTAINER}}.isGathered() ? fineGridVertex{{LOCAL_PARTICLES_CONTAINER}}.size() : -1;
389 {{TOUCH_VERTEX_FIRST_COMPUTE_KERNEL}}
394 __Template_TouchVertexLastTime = jinja2.Template(
396 {% if TOUCH_VERTEX_LAST_COMPUTE_KERNEL!=None %}
397 auto& assignedParticles = fineGridVertex{{LOCAL_PARTICLES_CONTAINER}};
398 int numberOfCoalescedAssignedParticles = fineGridVertex{{LOCAL_PARTICLES_CONTAINER}}.isGathered() ? fineGridVertex{{LOCAL_PARTICLES_CONTAINER}}.size() : -1;
400 {{TOUCH_VERTEX_LAST_COMPUTE_KERNEL}}
405 __Template_TouchCellFirstTime = jinja2.Template(
407 std::list< globaldata::{{LOCAL_PARTICLE}}* > localParticles;
408 std::vector< int > numberOfCoalescedLocalParticlesPerVertex( TwoPowerD );
410 _numberOfActiveParticlesAdded.push_back(0);
411 for (int i=0; i<TwoPowerD; i++) {
412 // Keep track of number of particles
413 _numberOfActiveParticlesPerVertex.push_back( fineGridVertices{{ACTIVE_PARTICLES_CONTAINER}}(i).size() );
414 _numberOfActiveParticlesAdded.back() += fineGridVertices{{ACTIVE_PARTICLES_CONTAINER}}(i).size();
415 numberOfCoalescedLocalParticlesPerVertex[i] = fineGridVertices{{ACTIVE_PARTICLES_CONTAINER}}(i).size();
417 // Add particles to active particle set
418 for (auto* p: fineGridVertices{{ACTIVE_PARTICLES_CONTAINER}}(i) ) {
419 _activeParticles.push_back( p );
422 // Construct the two local particle sets
423 for (auto* p: fineGridVertices{{LOCAL_PARTICLES_CONTAINER}}(i) ) {
424 localParticles.push_back( p );
428 logDebug( "touchCellFirstTime(...)", "size of local/active particles=" << localParticles.size() << "/" << _activeParticles.size() << " at " << marker.toString() );
430 {% if PARTICLE_CELL_UPDATE_KERNEL!=None %}
431 std::list< globaldata::{{ACTIVE_PARTICLE}}* >& activeParticles = _activeParticles;
432 auto& numberOfCoalescedActiveParticlesPerVertex = _numberOfActiveParticlesPerVertex;
434 not numberOfCoalescedActiveParticlesPerVertex.empty()
436 (numberOfCoalescedActiveParticlesPerVertex.size() % TwoPowerD == 0),
437 numberOfCoalescedActiveParticlesPerVertex.size()
439 {{PARTICLE_PARTICLE_INTERACTION_KERNEL}}
444 __Template_TouchCellLastTime = jinja2.Template(
446 assertion( not _numberOfActiveParticlesAdded.empty() );
447 assertion( _activeParticles.size()>=_numberOfActiveParticlesAdded.back() );
448 assertion( _numberOfActiveParticlesPerVertex.size()>=TwoPowerD );
451 // More elegant way to write
452 // -------------------------
453 // for (int i=0; i<_numberOfActiveParticlesAdded.back(); i++) {
454 // _activeParticles.pop_back();
456 _activeParticles.erase(
457 std::prev(_activeParticles.end(),_numberOfActiveParticlesAdded.back() ),
458 _activeParticles.end()
461 _numberOfActiveParticlesPerVertex.erase(
462 std::prev(_numberOfActiveParticlesPerVertex.end(),TwoPowerD ),
463 _numberOfActiveParticlesPerVertex.end()
466 _numberOfActiveParticlesAdded.pop_back();
470 __Template_EndTraversal = jinja2.Template(
472 assertion1( _activeParticles.empty(), (*_activeParticles.begin())->toString() );
480 if operation_name == ActionSet.OPERATION_TOUCH_CELL_FIRST_TIME:
482 if operation_name == ActionSet.OPERATION_TOUCH_CELL_LAST_TIME:
484 if operation_name == ActionSet.OPERATION_TOUCH_VERTEX_FIRST_TIME:
486 if operation_name == ActionSet.OPERATION_TOUCH_VERTEX_LAST_TIME:
488 if operation_name == ActionSet.OPERATION_END_TRAVERSAL:
493 return " return std::vector< peano4::grid::GridControlEvent >();\n"
496 return self.
d[
"PREPARE_TRAVERSAL_KERNEL"]
499 return self.
d[
"UNPREPARE_TRAVERSAL_KERNEL"]
502 return __name__.replace(
".py",
"").replace(
".",
"_")
508 result = jinja2.Template(
510#include "tarch/multicore/Lock.h"
511#include "toolbox/particles/particles.h"
512#include "vertexdata/{{LOCAL_PARTICLES_CONTAINER}}.h"
513#include "globaldata/{{LOCAL_PARTICLE}}.h"
514#include "vertexdata/{{ACTIVE_PARTICLES_CONTAINER}}.h"
515#include "globaldata/{{ACTIVE_PARTICLE}}.h"
517{{ADDITIONAL_INCLUDES}}
523 return result.render(**self.
d)
526 result = jinja2.Template(
528 std::list< globaldata::{{ACTIVE_PARTICLE}}* > _activeParticles;
529 std::vector< int > _numberOfActiveParticlesAdded;
530 std::vector< int > _numberOfActiveParticlesPerVertex;
534 return result.render(**self.
d)
538 _spacetreeId = treeNumber;
Default superclass for any data model in Peano which is stored within the grid.
Action set (reactions to events)
Tree walker to realise particle-particle interactions with particles held by different adjacent cells...
__Template_TouchVertexLastTime
get_body_of_getGridControlEvents(self)
get_includes(self)
Return include statements that you need.
get_body_of_unprepareTraversal(self)
__Template_TouchVertexFirstTime
get_action_set_name(self)
Return unique action set name.
get_body_of_prepareTraversal(self)
__Template_TouchCellLastTime
get_constructor_body(self)
Define a tailored constructor body.
__Template_TouchCellFirstTime
get_attributes(self)
Return attributes as copied and pasted into the generated class.
get_body_of_operation(self, operation_name)
Return actual C++ code snippets to be inserted into C++ code.
user_should_modify_template(self)
Is the user allowed to modify the output.
__init__(self, particle_set, particle_particle_interaction_kernel, touch_vertex_first_time_compute_particle_update_kernel=None, touch_vertex_last_time_compute_particle_update_kernel=None, prepare_traversal_kernel="", unprepare_traversal_kernel="", additional_includes="", active_particle_set=None)
Initialise object.