Skip to content

Add/Update some CUDA math kernels

Issue/feature addressed

This MR add the following CUDA kernels:

  • reduceMaxKernel
  • reduceMinKernel
  • l1normKernel
  • l2normKernel
  • lpnormKernel
  • linfnormKernel

Reduction kernels have been improved to used cooperative group, vector loading, and warp primitives

Proposed solution

Implementation

Tests

Suggested reviewers

Please suggest any people who would be appropriate to review your code.

Notes

Please add any other information that could be useful for reviewers.

Checklist

  • Functions and classes, or changes to them, are documented.
  • [ ] User guide/documentation is updated.
  • [ ] Changelog is updated.
  • Suitable tests added for new functionality.
  • Contributed code is correctly formatted. (See the contributing guidelines).
  • License added to any new files.
  • No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing

Merge request reports