Tidy-up CUDA implementation of BwdTrans and update CUDA kernels with additional parallelism
Showing
- Operators/BwdTrans/BwdTransCUDA.cu 2 additions, 0 deletionsOperators/BwdTrans/BwdTransCUDA.cu
- Operators/BwdTrans/BwdTransCUDA.hpp 134 additions, 167 deletionsOperators/BwdTrans/BwdTransCUDA.hpp
- Operators/BwdTrans/BwdTransCUDAKernels.cuh 949 additions, 652 deletionsOperators/BwdTrans/BwdTransCUDAKernels.cuh
- Operators/BwdTrans/BwdTransMatFree.hpp 3 additions, 2 deletionsOperators/BwdTrans/BwdTransMatFree.hpp
- Operators/BwdTrans/BwdTransMatFreeKernels.hpp 1 addition, 0 deletionsOperators/BwdTrans/BwdTransMatFreeKernels.hpp
- Operators/BwdTrans/BwdTransStdMat.hpp 4 additions, 1 deletionOperators/BwdTrans/BwdTransStdMat.hpp
- Operators/BwdTrans/BwdTransSumFac.hpp 2 additions, 0 deletionsOperators/BwdTrans/BwdTransSumFac.hpp
Loading
Please register or sign in to comment