Implement CUDA BwdTrans sum-factorization kernels
CUDA kernels for BwdTrans
operation have been implemented for all element types (seg, quad, tri, hex, tet, prism, and pyr). All implementations include two variants, one that only considers threading across the elements (which is analogous to the SIMD-based matrix-free implementation), and one, noted QP
, that considers threading across both the elements and quadrature points.
To avoid multiple copy from CPU to GPU, basis data are copied to the GPU using the constructor.
This MR also removes the square2.xml
XML files but introduces segment.xml
, square_all_elements.xml
, and cube_all_elements.xml
XML files which collectively include seg, quad, tri, hex, tet, prism, and pyr element types and the current implementation has been partially validated for all element types.
Merge request reports
Activity
assigned to @ccantwel
requested review from @dmoxey
requested review from @Ganlin
@CFD-Xing Can you merge in master and we will try and get this one merged next?
added 3 commits
-
43f9540c...142c6bd4 - 2 commits from branch
nektar:master
- ada2c0e8 - Merge remote-tracking branch 'upstream/master' into bwd_trans_sum_fac_cuda_kernels
-
43f9540c...142c6bd4 - 2 commits from branch
69 auto f_out = Field<double, stateOut>::create(blocks_out); 70 fixt_in = new Field<double, stateIn>(std::move(f_in)); 71 fixt_out = new Field<double, stateOut>(std::move(f_out)); 75 auto f_in = Field<TData, stateIn>::create(blocks_in); 76 auto f_out = Field<TData, stateOut>::create(blocks_out); 77 fixt_in = new Field<TData, stateIn>(std::move(f_in)); 78 fixt_out = new Field<TData, stateOut>(std::move(f_out)); 79 #ifdef NEKTAR_USE_CUDA 80 auto fcuda_in = 81 Field<TData, stateIn>::template create<MemoryRegionCUDA>(blocks_in); 82 auto fcuda_out = 83 Field<TData, stateOut>::template create<MemoryRegionCUDA>( 84 blocks_out); 85 fixtcuda_in = new Field<TData, stateIn>(std::move(fcuda_in)); 86 fixtcuda_out = new Field<TData, stateOut>(std::move(fcuda_out)); 87 #endif 25 29 * https://www.boost.org/doc/libs/1_82_0/libs/test/doc/html/boost_test/tests_organization/fixtures/case.html 26 30 */ 27 31 28 template <FieldState stateIn = FieldState::Coeff, 32 template <typename TData, FieldState stateIn = FieldState::Coeff, - Resolved by Jacques Xing
@ccantwel This should be ready to be merged. I have made some modifications to the main.cpp to make naming and formatting more consistent. As well, I have introduced scoping by using bracket {} to avoid collision every time that someone is going to add something.
added 3 commits
-
5149cef4...e83c7bc5 - 2 commits from branch
nektar:master
- e44ea575 - Merge remote-tracking branch 'upstream/master' into bwd_trans_sum_fac_cuda_kernels
-
5149cef4...e83c7bc5 - 2 commits from branch
- tests/test_bwdtranscuda.cpp 0 → 100644
55 } 56 57 BwdTrans<>::create(fixt_explist, "CUDA") 58 ->apply(*fixtcuda_in, *fixtcuda_out); 59 60 static double *y = 61 fixtcuda_out->template GetStorage<MemoryRegionCUDA>().GetCPUPtr(); 62 double TOL = 1e-12; 63 BOOST_CHECK_CLOSE(y[0], 1.000000000000000, TOL); 64 BOOST_CHECK_CLOSE(y[1], 0.808463389187877, TOL); 65 BOOST_CHECK_CLOSE(y[2], 1.993385866728399, TOL); 66 BOOST_CHECK_CLOSE(y[3], 1.312500000000000, TOL); 67 BOOST_CHECK_CLOSE(y[4], 2.321846776942122, TOL); 68 BOOST_CHECK_CLOSE(y[5], 4.082915537389534, TOL); 69 BOOST_CHECK_CLOSE(y[6], 2.000000000000000, TOL); 70 } @ccantwel I remerged, should be ready.
mentioned in commit 3bb37410
mentioned in issue #4 (closed)
mentioned in merge request !15 (merged)