Skip to content

Update PhysDeriv CUDA kernel implementation

Issue/feature addressed

This MR tidy-up PhysDeriv CUDA kernels in prevision of future benchmarking tests

Here are a summary of the changes:

  • Tidy-up PhysDeriv CUDA kernel implementation
  • Fuse kernel functions for better efficiency
  • Simplify loops in kernel functions
  • Improved coalesced memory access to global memory
  • Add templated option for shared memory
  • Add 1D indexing version of QP (multilevel) Kernels
  • Interleave data for non-QP kernels

To do (future MRs):

  • Interleave data for non-QP kernels
  • Consider templating nq0, nq1, nq2 parameters
  • Optimizing CUDA grid parameters

Proposed solution

Implementation

Tests

Suggested reviewers

Please suggest any people who would be appropriate to review your code.

Notes

Please add any other information that could be useful for reviewers.

Checklist

  • Functions and classes, or changes to them, are documented.
  • [ ] User guide/documentation is updated.
  • [ ] Changelog is updated.
  • [ ] Suitable tests added for new functionality.
  • Contributed code is correctly formatted. (See the contributing guidelines).
  • [ ] License added to any new files.
  • No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing

Merge request reports