Draft: Add AVXStdMat Implementation
Issue/feature addressed
The current SerialStdMat
implementation does not make good use of the cache and AVX capability in CPU.
Proposed solution
Change SerialStdMat
to a cache-friendly implementation SerialAVXStdMat
with a similar design as the SerialAVXSumFac
Implementation
- Change DataType to
simd_t
; - Interleave data layout to
simd_t::width
; - Replace
BLAS
with manual for loops; - Maybe use BOOST PP to switch to a specific size operator;
Tests
Suggested reviewers
Please suggest any people who would be appropriate to review your code.
Notes
Please add any other information that could be useful for reviewers.
Checklist
-
Functions and classes, or changes to them, are documented. -
User guide/documentation is updated. -
Changelog is updated. -
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).