Tidy-up CUDA launchers
Issue/feature addressed
Encapsulate CUDA specific grid size parameters withing CUDA kernel launchers to allow implementation specific optimization
The CUDA math kernels implementation will be optimized in a separate MR. This MR has for purpose to unify the API.
Proposed solution
Implementation
Tests
Suggested reviewers
Notes
Checklist
-
Functions and classes, or changes to them, are documented. [ ] User guide/documentation is updated.[ ] Changelog is updated.-
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing