Feature/redesign fix cuda kernel memory
Issue/feature addressed
This MR remove CUDA heap memory allocation for kernel functions and instead pre-allocate the required memory from the host. Additionally,
- Consistently use
__restrict__
keyword - Activate
clang-format
for CUDA files - Add
reduceMax
CUDA kernel
Proposed solution
Implementation
Tests
Suggested reviewers
Please suggest any people who would be appropriate to review your code.
Notes
Please add any other information that could be useful for reviewers.
Checklist
-
Functions and classes, or changes to them, are documented. [ ] User guide/documentation is updated.[ ] Changelog is updated.[ ] Suitable tests added for new functionality.-
Contributed code is correctly formatted. (See the contributing guidelines). [ ] License added to any new files.-
No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing