Tidy-up CUDA implementation of BwdTrans and update CUDA kernels with additional parallelism
Compare changes
Some changes are not shown
For a faster browsing experience, some files are collapsed by default.
Files
7@@ -2,9 +2,11 @@
This MR tidy-up the previous implementation of the CUDA version of the PhysDeriv operator
For a faster browsing experience, some files are collapsed by default.