Draft: Performance improvements in the block diagonal operator of the preconditioner
Issue/feature addressed
In this merge request, a vectorised version of the routine PreconCfsBRJ::PreconBlkDiag
is proposed. This routine is related to the D^{-1} operator (block diagonal operator) of the preconditioner. It was identified that the block diagonal operator may take approximately 30% of the runtime depending on the element type in a three-dimensional simulation. Therefore, here we propose a solution to improve the performance of this routine and consequently speed up the code.
Proposed solution
The routine PreconCfsBRJ::PreconBlkDiag
was modified to make use of SIMD instructions.
Implementation
The main bottleneck of this routine is a multiplication (outVect = (*PreconMatVars) * tmpVect
) in the PreconCfsBRJ::PreconBlkDiag
method. In the new version of this routine, the code makes use of SIMD instructions to perform this multiplication.
Tests
There is no need for new test cases.
Notes
The vectorised version of the D^{-1} operator (new implementation) was exhaustively tested, in order to check if we get significant speed up for all element types. The new implementation was tested for three different element types. The element types tested were hex, prism and tet. For the hexahedral elements, we observed that the method became nearly 4 times faster using the new implementation. For the prismatic elements, the method became approximately 3 times faster and we got 25% of improvement for the tetrahedral elements. Note that these tests were performed using box cases which had around 5000 DOF on the nektar compute nodes. In addition, the numerical simulations were carried out with 12 cores.
Checklist
-
Functions and classes, or changes to them, are documented. -
User guide/documentation is updated. -
Changelog is updated. -
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).
Warning
On the 19.07 the code formatting (code style) was standardised using clang-format, over the whole Nektar++ code. This means changes in your branch will conflict with formatting changes on the master
branch. To resolve these conflicts , see
#295 (closed)