Skip to content

Performance improvements in the block diagonal operator of the preconditioner

João Isler requested to merge feature/CfsVecBlkDiagOperator into master

Issue/feature addressed

In this merge request, a vectorised version of the routine PreconCfsBRJ::PreconBlkDiag is proposed. This routine is related to the D^{-1} operator (block diagonal operator) of the preconditioner. It was identified that the block diagonal operator may take approximately 30% of the runtime depending on the element type in a three-dimensional simulation. Therefore, here we propose a solution to improve the performance of this routine and consequently speed up the code.

Proposed solution

The routine PreconCfsBRJ::PreconBlkDiag was modified to make use of SIMD instructions.

Implementation

The main bottleneck of this routine is a multiplication (outVect = (*PreconMatVars) * tmpVect) in the PreconCfsBRJ::PreconBlkDiag method. In the new version of this routine, the code makes use of SIMD instructions to perform this multiplication.

Tests

There is no need for new test cases. Existing tests have been suitable for catching a number of development bugs.

Notes

The vectorised version of the D^{-1} operator (new implementation) was exhaustively tested, in order to check if we get significant speed up for all element types. The new implementation was tested for three different element types. The element types tested were hex, prism and tet. For the hexahedral elements, we observed that the method became nearly 4 times faster using the new implementation. For the prismatic elements, the method became approximately 3 times faster and we got 25% of improvement for the tetrahedral elements. Note that these tests were performed using box cases which had around 5000 DOF on the nektar compute nodes. In addition, the numerical simulations were carried out with 12 cores.

Checklist

  • Functions and classes, or changes to them, are documented.
  • N/A User guide/documentation is updated.
  • Changelog is updated.
  • N/A Suitable tests added for new functionality.
  • Contributed code is correctly formatted. (See the contributing guidelines).
  • License added to any new files.
  • No extraneous files have been added (e.g. compiler output or test data files).
Edited by João Isler

Merge request reports