Skip to content

Remove copy entire input and output to padded aligned arrays in MatrixFree operators

BOYANG XIA requested to merge xby2233/nektar:feature/remove_matfree_copy into master

Issue/feature addressed

In the previous design, the matrix-free operator Helmholtz_MatrixFree copies the input array into the internal padded aligned storage MatrixFreeOneInOneOut::m_input before the real task starts. This redundant large copy significantly slows down the speed.

Proposed solution

Remove the copy operation and let operators in MatrixFreeOps support unaligned memory. After this change, the legacy collection matrix-free operators should achieve comparable performance as the redesign matrix-free operators.

Implementation

The initial design is:

  • Use an aligned local array as an intermediate storage.
  • Always copy data to aligned local storage before load_interleave and deinterleave
  • The last sub-block, which may contain paddings, will be processed in a separate code block.

An updated design is:

  • Add a new function load_unalign_interleave, which can directly access unaligned memory. So we don't need to copy to an aligned local array.

Here are the benchmark results:

image.pngimage.png

Although there are noises or errors, we can still get some general conclusions:

  • The legacy collection matrix-free is much slower than the redesign, except for hex-7, which requires no padding (no copy) by coincidence.
  • With this MR, the collection matrix-free is much closer to Redesign matrix-free and vec-testing.
  • The updated design is slightly faster than the initial design by further reducing the memory loads.

Tests

No new tests are required.

Suggested reviewers

@dmoxey @ccantwel

Notes

A complete change for all operators takes a lot of effort. So we first apply changes to Helmholtz then to the others.

Everyone especially the MatrixFreeOps contributors are welcome to join in this MR.

TheMatrixFreeBase derived class can be removed after all the operators are updated.

Checklist

  • Functions and classes, or changes to them, are documented.
  • User guide/documentation is updated.
  • Changelog is updated.
  • Suitable tests added for new functionality.
  • Contributed code is correctly formatted. (See the contributing guidelines).
  • License added to any new files.
  • No extraneous files have been added (e.g. compiler output or test data files).
Edited by BOYANG XIA

Merge request reports