Implement CUDA IProductWRTBase sum-factorization kernels
CUDA kernels for IProductWRTBase
operation have been implemented for all element types (seg, quad, tri, hex, tet, prism, and pyr). All implementations are based on the SIMD-based matrix-free version.
To avoid multiple copy from CPU to GPU, geometric and basis data are copied to the GPU using the constructor.
Edited by Jacques Xing
Merge request reports
Activity
Filter activity
Please register or sign in to reply