Skip to content
Snippets Groups Projects

Implement CUDA BwdTrans sum-factorization kernels

3 unresolved threads

CUDA kernels for BwdTrans operation have been implemented for all element types (seg, quad, tri, hex, tet, prism, and pyr). All implementations include two variants, one that only considers threading across the elements (which is analogous to the SIMD-based matrix-free implementation), and one, noted QP, that considers threading across both the elements and quadrature points.

To avoid multiple copy from CPU to GPU, basis data are copied to the GPU using the constructor.

This MR also removes the square2.xml XML files but introduces segment.xml, square_all_elements.xml, and cube_all_elements.xml XML files which collectively include seg, quad, tri, hex, tet, prism, and pyr element types and the current implementation has been partially validated for all element types.

Edited by Jacques Xing

Merge request reports

Approved by

Merged by Chris CantwellChris Cantwell 1 year ago (Jul 18, 2023 11:00am UTC)

Merge details

  • Changes merged into master with 3bb37410 (commits were squashed).
  • Deleted the source branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
69 auto f_out = Field<double, stateOut>::create(blocks_out);
70 fixt_in = new Field<double, stateIn>(std::move(f_in));
71 fixt_out = new Field<double, stateOut>(std::move(f_out));
75 auto f_in = Field<TData, stateIn>::create(blocks_in);
76 auto f_out = Field<TData, stateOut>::create(blocks_out);
77 fixt_in = new Field<TData, stateIn>(std::move(f_in));
78 fixt_out = new Field<TData, stateOut>(std::move(f_out));
79 #ifdef NEKTAR_USE_CUDA
80 auto fcuda_in =
81 Field<TData, stateIn>::template create<MemoryRegionCUDA>(blocks_in);
82 auto fcuda_out =
83 Field<TData, stateOut>::template create<MemoryRegionCUDA>(
84 blocks_out);
85 fixtcuda_in = new Field<TData, stateIn>(std::move(fcuda_in));
86 fixtcuda_out = new Field<TData, stateOut>(std::move(fcuda_out));
87 #endif
  • 25 29 * https://www.boost.org/doc/libs/1_82_0/libs/test/doc/html/boost_test/tests_organization/fixtures/case.html
    26 30 */
    27 31
    28 template <FieldState stateIn = FieldState::Coeff,
    32 template <typename TData, FieldState stateIn = FieldState::Coeff,
  • Jacques Xing added 1 commit

    added 1 commit

    Compare with previous version

  • Jacques Xing added 1 commit

    added 1 commit

    Compare with previous version

  • Author Maintainer

    @ccantwel This should be ready to be merged. I have made some modifications to the main.cpp to make naming and formatting more consistent. As well, I have introduced scoping by using bracket {} to avoid collision every time that someone is going to add something.

  • Jacques Xing added 3 commits

    added 3 commits

    • 5149cef4...e83c7bc5 - 2 commits from branch nektar:master
    • e44ea575 - Merge remote-tracking branch 'upstream/master' into bwd_trans_sum_fac_cuda_kernels

    Compare with previous version

  • Jacques Xing added 1 commit

    added 1 commit

    Compare with previous version

  • 55 }
    56
    57 BwdTrans<>::create(fixt_explist, "CUDA")
    58 ->apply(*fixtcuda_in, *fixtcuda_out);
    59
    60 static double *y =
    61 fixtcuda_out->template GetStorage<MemoryRegionCUDA>().GetCPUPtr();
    62 double TOL = 1e-12;
    63 BOOST_CHECK_CLOSE(y[0], 1.000000000000000, TOL);
    64 BOOST_CHECK_CLOSE(y[1], 0.808463389187877, TOL);
    65 BOOST_CHECK_CLOSE(y[2], 1.993385866728399, TOL);
    66 BOOST_CHECK_CLOSE(y[3], 1.312500000000000, TOL);
    67 BOOST_CHECK_CLOSE(y[4], 2.321846776942122, TOL);
    68 BOOST_CHECK_CLOSE(y[5], 4.082915537389534, TOL);
    69 BOOST_CHECK_CLOSE(y[6], 2.000000000000000, TOL);
    70 }
  • Author Maintainer

    @ccantwel I remerged, should be ready.

  • Jacques Xing added 1 commit

    added 1 commit

    Compare with previous version

  • Jacques Xing added 1 commit

    added 1 commit

    • 3301b4e4 - Make some functions non-member of the class

    Compare with previous version

  • Jacques Xing added 1 commit

    added 1 commit

    • 6e521141 - Move some functionalities to a helper

    Compare with previous version

  • Chris Cantwell approved this merge request

    approved this merge request

  • merged

  • Chris Cantwell mentioned in commit 3bb37410

    mentioned in commit 3bb37410

  • mentioned in issue #4 (closed)

  • Jacques Xing mentioned in merge request !15 (merged)

    mentioned in merge request !15 (merged)

  • Please register or sign in to reply
    Loading