Skip to content

Add SYCL BwdTrans Operator to redesign

Issue/feature addressed

Integrating SYCL into the redesign and implementing SYCL kernels for the BwdTrans SumFac Operator.

Proposed solution

  • Addition of a SYCL Queue singleton - can be crude implementation just to ensure only one queue is created for the time being
  • Integrating what are now outdated SYCL BwdTrans kernels into the updated redesign structure. These are based on an older version of the equivalent CUDA kernels.
  • Addition of unit tests matching those of the CUDA implementation

Implementation

  • CMakeLists.txt
    • added barebones NEKTAR_ENABLE_SYCL option. Will probably need to move out into separate cmake file in future. No SYCL specific compiler options added e.g. -fsycl as those can be supplied by the user for now via -DCMAKE_CXX_FLAGS="-fsycl -fsycl-targets=...".
  • BwdTransSYCLSumFacKernels.hpp
    • I tried to match as closely to CUDA implementation as possible. I've not renamed some variables which are using CUDA terminology i.e. blocksize, gridsize etc.
    • There are some Kernels which couldn't be directly integrated into the new structure which uses wsp as a global memory buffer without a complete rewrite. Since all the kernels will need to be updated anyway to match the newer CUDA kernels I've included them for completeness. The good news is that by default the affected kernels will not be used but a comment has been made by each one anyway.
  • SYCLQueue files have been left in library/Operators until we find a better place for it. No fancy device selection logic just the default selector - use SYCL_DEVICE_SELECTOR=host, gpu, etc... when running.
  • Spaces.hpp needed to be modified as SYCL does not define the __device__ keyword
  • Operators/CMakeLists.txt: I've had to use a manual workaround for including the SYCL files as the ADD_OPERATOR macro generates SYCL source files for operators which don't have a SYCL header. This causes compiler errors (at least with icpx) so I've had to manually add the correct files and create BwdTransSYCLSumFac.cpp for template instantiation.

Tests

  • test_bwdtrans_sycl_sumfac.cpp added with tests matching test_bwdtrans_cuda_sumfac.cpp.
  • init_fields.hpp updated with SYCL specific variables

Suggested reviewers

@ccantwel

Notes

I can't add the merge redesign label so someone will have to add it for me

Checklist

  • Functions and classes, or changes to them, are documented.
  • [ ] User guide/documentation is updated.
  • [ ] Changelog is updated.
  • Suitable tests added for new functionality.
  • Contributed code is correctly formatted. (See the contributing guidelines).
  • License added to any new files.
  • No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing

Merge request reports

Loading