Tidy-up CUDA implementation of IProductWRTBase and add CUDA kernels with additional parallelism
This MR tidy-up the previous implementation of the CUDA version of the IProductWRTBase operator and introduces new CUDA kernels with additional parallelism across quadrature points.
Edited by Jacques Xing
Merge request reports
Activity
Filter activity
added 6 commits
-
3ece86d8...d4341fdc - 4 commits from branch
nektar:master
- a42321c1 - Merge remote-tracking branch 'upstream/master' into tidy-up-iproductwrtbase
- be0b231a - Implement "QP" version of IProductWRTBase CUDA kernels
-
3ece86d8...d4341fdc - 4 commits from branch
added 3 commits
-
be0b231a...47f84d4e - 2 commits from branch
nektar:master
- b314eb4c - Merge remote-tracking branch 'upstream/master' into tidy-up-iproductwrtbase
-
be0b231a...47f84d4e - 2 commits from branch
added 2 commits
assigned to @ccantwel
requested review from @ccantwel
added 1 commit
- e49caeb8 - Variabe name changes and add missing synchronization
added 1 commit
- 19501ac2 - Switch size_t to unsigned int in CUDA kernels
added 1 commit
- 1a0dcdee - Switch size_t to unsigned int in CUDA kernels
mentioned in commit 09df250b
Please register or sign in to reply