Create new MPI communication routines for DG implementation
This MR replaces the existing Gs implementation of the DG parallel exchange routines with a custom class AssemblyCommDG
. This works in a similar way to Gs, timing a number of different parallel implementation strategies at startup:
-
AllToAll
andAllToAllV
perform two variants of all-to-all communication (without and with variable offsets, respectively). The latter is almost always preferable but in cases where communication is very similar (e.g. structured grids) the former may be preferable. -
Pairwise
implements asynchronous send/recv calls, which seems to be generally the preferred implementation on current tests (and essentially replicates the Gs pairwise routines); -
Neighbour_Alltoallv
uses an MPI 3.0 extension that allows for construction of a distributed graph representing the mesh topology, which seems to be the preferred exchange routine at low element to core counts.
The strategy adopted is to declare edges lying on partition boundaries to always be left-adjacent (or fwd
edges). The AssemblyCommDG
routines then take care of communicating the fwd
space to the bwd
space of the corresponding element on the remote processor. This has the advantage of removing the need for the normal negation routines, which have all been removed.
Timing test
1.6m quad cell ADRSolver case P = 8, run for 10s - measuring CPU Time every 0.1s and recording average of the 100 measures.
Number of Nodes (e.w. 20 cores): | 1 | 2 | 4 | 8 | 16 | 32 | 48 | 64 |
---|---|---|---|---|---|---|---|---|
Avg. Master Time (s): | 58.5694 | 28.7446 | 13.2829 | 6.1681 | 2.9595 | 1.5473 | 0.9953 | 0.7530 |
Avg. New Branch Time (s): | 58.4441 | 28.5092 | 12.9718 | 6.0861 | 2.8885 | 1.4673 | 0.9557 | 0.7230 |
Difference (%): | -0.21% | -0.82% | -2.34% | -1.33% | -2.40% | -5.17% | -3.99% | -3.98% |
Edited by Edward Laughton