Skip to content
Snippets Groups Projects

Basic implementation of Matrix-free BwdTrans and padding feature in Field

Merged BOYANG XIA requested to merge xby2233/redesign-prototypes:bwd_trans_matfree4 into master

This matrix-free BwdTrans implementation supports all shape types, any number of elements, and SIMD.

A new padding feature is also added to the Field class.

  • First, GetBlockAttrb() computes num_padding_elements according to the vector width and the block_size will include paddings as well.
  • Then Field::create() allocates m_stroage based on block_size, which is large enough for later use. Align is also set here.
  • By default, the newly created Field is marked as m_curVecWidth = 1, which means no interleave. There will be unused memory (num_padding_elements * num_pts) at the end of each block. Make sure you don't access them accidentally. For example, in StdMat implementation, to move to the next block, we should use inptr += block_size instead of inptr += num_pts * num_elements.
  • ReshapeStorage() will change the storage layout to other widths, but not great than the initial width. Usually, we only need to transform between scalar and vec_t::width.

The simple CTest case (requiring padding) works.

Edited by BOYANG XIA

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
92 92
93 93 // Generate a blocks definition from the expansion list for each state
94 94 using vec_t = tinysimd::simd<double>;
95 #ifndef NEKTAR_USE_CUDA
95 96 auto blocks_in =
96 97 GetBlockAttributes(stateIn, fixt_explist, vec_t::width);
97 98 auto blocks_out =
98 99 GetBlockAttributes(stateOut, fixt_explist, vec_t::width);
100 #else
  • BOYANG XIA added 1 commit

    added 1 commit

    Compare with previous version

  • BOYANG XIA added 1 commit

    added 1 commit

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading