Skip to content

Implement CUDA BwdTrans sum-factorization kernels

CUDA kernels for BwdTrans operation have been implemented for all element types (seg, quad, tri, hex, tet, prism, and pyr). All implementations include two variants, one that only considers threading across the elements (which is analogous to the SIMD-based matrix-free implementation), and one, noted QP, that considers threading across both the elements and quadrature points.

To avoid multiple copy from CPU to GPU, basis data are copied to the GPU using the constructor.

This MR also removes the square2.xml XML files but introduces segment.xml, square_all_elements.xml, and cube_all_elements.xml XML files which collectively include seg, quad, tri, hex, tet, prism, and pyr element types and the current implementation has been partially validated for all element types.

Edited by Jacques Xing

Merge request reports

Loading