Add CUDA backend implementation for AddTraceIntegral
Issue/feature addressed
The AddTraceIntegral operator was re-factored as general implementation AddTraceIntegralImpl.hpp
. For the CUDA backend, this introduces repeated device/host and host/device copies. The same problem will occur with a Kokkos-CUDA backend.
Proposed solution
- Full implement the
locTraceToTraceMap
inAddTraceIntegralSerialStdMat.hpp
- Add a specialized CUDA backend
AddTraceIntegralCUDASumFac.cuh
and a specialized Kokkos backendAddTraceIntegralKokkosStdMat.hpp
.
Implementation
Tests
- The existing CUDA test is used to test the proposed new implementation
- The existing Kokkos test is now disable as a Kokkos
IProductWRTBase
operator backend is not yet implemented
Suggested reviewers
Please suggest any people who would be appropriate to review your code.
Notes
Please add any other information that could be useful for reviewers.
Checklist
-
Functions and classes, or changes to them, are documented. [ ] User guide/documentation is updated.[ ] Changelog is updated.-
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing