Skip to content

Update MultiplyByElmtInvMass for block-based implementation

Issue/feature addressed

  • Block-based implementation of MultiplyByElmtInvMass
  • Implement a initial CUDA backend for the MultiplyByElmtInvMass operators
  • Avoid copy to an Array for the Serial backend by using blas dgemm/dgemv directly
  • Add singleton for cublas handle
  • Used batched version of dgemm of cublas for deformed block

Proposed solution

Implementation

Tests

Suggested reviewers

Notes

Checklist

  • Functions and classes, or changes to them, are documented.
  • [ ] User guide/documentation is updated.
  • [ ] Changelog is updated.
  • Suitable tests added for new functionality.
  • Contributed code is correctly formatted. (See the contributing guidelines).
  • License added to any new files.
  • No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing

Merge request reports

Loading