Implement CUDA IProductWRTBase sum-factorization kernels
Compare changes
Some changes are not shown
For a faster browsing experience, some files are collapsed by default.
Files
11+ 10
− 0
CUDA kernels for IProductWRTBase
operation have been implemented for all element types (seg, quad, tri, hex, tet, prism, and pyr). All implementations are based on the SIMD-based matrix-free version.
To avoid multiple copy from CPU to GPU, geometric and basis data are copied to the GPU using the constructor.
For a faster browsing experience, some files are collapsed by default.