Tidy-up CUDA implementation of IProductWRTDerivBase and add CUDA kernels with additional parallelism
This MR tidy-up the previous implementation of the CUDA version of the IProductWRTDerivBase operator and introduces new CUDA kernels with additional parallelism across quadrature points.
Edited by Jacques Xing
Merge request reports
Activity
Please register or sign in to reply