Update MultiplyByElmtInvMass for block-based implementation
requested to merge CFD-Xing/nektar:feature/redesign/MultiplyByElmtInvMass-cuda into feature/redesign
Issue/feature addressed
- Block-based implementation of MultiplyByElmtInvMass
- Implement a initial CUDA backend for the MultiplyByElmtInvMass operators
- Avoid copy to an Array for the Serial backend by using blas dgemm/dgemv directly
- Add singleton for cublas handle
- Used batched version of dgemm of cublas for deformed block
Proposed solution
Implementation
Tests
Suggested reviewers
Notes
Checklist
-
Functions and classes, or changes to them, are documented. [ ] User guide/documentation is updated.[ ] Changelog is updated.-
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing