Add SYCL BwdTrans Operator to redesign
Issue/feature addressed
Integrating SYCL into the redesign and implementing SYCL kernels for the BwdTrans SumFac Operator.
Proposed solution
- Addition of a SYCL Queue singleton - can be crude implementation just to ensure only one queue is created for the time being
- Integrating what are now outdated SYCL BwdTrans kernels into the updated redesign structure. These are based on an older version of the equivalent CUDA kernels.
- Addition of unit tests matching those of the CUDA implementation
Implementation
-
CMakeLists.txt
- added barebones NEKTAR_ENABLE_SYCL option. Will probably need to move out into separate cmake file in future. No SYCL specific compiler options added e.g.
-fsycl
as those can be supplied by the user for now via-DCMAKE_CXX_FLAGS="-fsycl -fsycl-targets=..."
.
- added barebones NEKTAR_ENABLE_SYCL option. Will probably need to move out into separate cmake file in future. No SYCL specific compiler options added e.g.
-
BwdTransSYCLSumFacKernels.hpp
- I tried to match as closely to CUDA implementation as possible. I've not renamed some variables which are using CUDA terminology i.e. blocksize, gridsize etc.
- There are some Kernels which couldn't be directly integrated into the new structure which uses
wsp
as a global memory buffer without a complete rewrite. Since all the kernels will need to be updated anyway to match the newer CUDA kernels I've included them for completeness. The good news is that by default the affected kernels will not be used but a comment has been made by each one anyway.
-
SYCLQueue
files have been left inlibrary/Operators
until we find a better place for it. No fancy device selection logic just the default selector - useSYCL_DEVICE_SELECTOR=host, gpu, etc...
when running. -
Spaces.hpp
needed to be modified as SYCL does not define the__device__
keyword -
Operators/CMakeLists.txt
: I've had to use a manual workaround for including the SYCL files as the ADD_OPERATOR macro generates SYCL source files for operators which don't have a SYCL header. This causes compiler errors (at least with icpx) so I've had to manually add the correct files and createBwdTransSYCLSumFac.cpp
for template instantiation.
Tests
-
test_bwdtrans_sycl_sumfac.cpp
added with tests matchingtest_bwdtrans_cuda_sumfac.cpp
. -
init_fields.hpp
updated with SYCL specific variables
Suggested reviewers
Notes
I can't add the merge redesign label so someone will have to add it for me
Checklist
-
Functions and classes, or changes to them, are documented. [ ] User guide/documentation is updated.[ ] Changelog is updated.-
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing