Skip to content

Tidy-up CUDA implementation of BwdTrans and update CUDA kernels with additional parallelism

This MR tidy-up the previous implementation of the CUDA version of the PhysDeriv operator

Edited by Jacques Xing

Merge request reports