Remove copy entire input and output to padded aligned arrays in MatrixFree operators
Issue/feature addressed
In the previous design, the matrix-free operator Helmholtz_MatrixFree
copies the input array into the internal padded aligned storage MatrixFreeOneInOneOut::m_input
before the real task starts. This redundant large copy significantly slows down the speed.
Proposed solution
Remove the copy operation and let operators in MatrixFreeOps support unaligned memory. After this change, the legacy collection matrix-free operators should achieve comparable performance as the redesign matrix-free operators.
Implementation
The initial design is:
- Use an aligned local array as an intermediate storage.
- Always copy data to aligned local storage before
load_interleave
anddeinterleave
- The last sub-block, which may contain paddings, will be processed in a separate code block.
An updated design is:
- Add a new function
load_unalign_interleave
, which can directly access unaligned memory. So we don't need to copy to an aligned local array.
Here are the benchmark results:
Although there are noises or errors, we can still get some general conclusions:
- The legacy collection matrix-free is much slower than the redesign, except for hex-7, which requires no padding (no copy) by coincidence.
- With this MR, the collection matrix-free is much closer to Redesign matrix-free and vec-testing.
- The updated design is slightly faster than the initial design by further reducing the memory loads.
Tests
No new tests are required.
Suggested reviewers
Notes
A complete change for all operators takes a lot of effort. So we first apply changes to Helmholtz
then to the others.
Everyone especially the MatrixFreeOps contributors are welcome to join in this MR.
TheMatrixFreeBase
derived class can be removed after all the operators are updated.
Checklist
-
Functions and classes, or changes to them, are documented. [ ] User guide/documentation is updated.-
Changelog is updated. [ ] Suitable tests added for new functionality.-
Contributed code is correctly formatted. (See the contributing guidelines). [ ] License added to any new files.-
No extraneous files have been added (e.g. compiler output or test data files).