Performance improvements for the diffusion operator and preconditioners
Issue/feature addressed
After profiling the code, we identified three methods (operators) which consume most of the runtime when we perform numerical simulations with the compressible solver. In this merge request, we address two of the routines related to these operators. The first routine is related to the perconditioner (PreconCfsBRJ::MinusOffDiag2Rhs
) and the second is related to the diffusion operator (DiffusionIP::AddSecondDerivToTrace
).
Proposed solution
Optimise the routines proposed above in order to speed up the code.
Implementation
For the PreconCfsBRJ::MinusOffDiag2Rhs
routine (L+U operator), we optimised the code using two approaches. In the first approach, we add the possibility for the m_trace
expansion list to use any of the auto-tuning routines. In the second approach, we replaced the Vmath operators by for loops. In doing so, we perform all the calculations locally so that we avoid sending the variables by argument through the Vmath operator. With this implementation, we achieved approximately 12% speed up using a big three-dimensional case with hex and prism elements for this routine. For the DiffusionIP::AddSecondDerivToTrace
, we used the same approach described before, we replaced the Vmath operator by for loops. Note that, this routine is not being employed by the compressible solver at the moment, however this new implementation achieved 2.5% speed up for a two-dimensional cylinder case.
Tests
There is no need for a new test case, unless we would like to add a test for the DiffusionIP::AddSecondDerivToTrace
routine.
Notes
Additional checks were performed using three-dimensional meshes with hex, prism and tet elements in order to make sure that when we use for loops instead of the Vmath operator we get speed up. In all cases, when the Vmath operators were replaced by for loops the code achieved a better performance.
Checklist
-
Functions and classes, or changes to them, are documented. -
User guide/documentation is updated. -
Changelog is updated. -
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).
Warning
On the 19.07 the code formatting (code style) was standardised using clang-format, over the whole Nektar++ code. This means changes in your branch will conflict with formatting changes on the master
branch. To resolve these conflicts , see
#295 (closed)