Fix padding and vector width for device backend and add initial multi-GPU implementation for ConjGrad
Issue/feature addressed
- Update to account for CUDA warp size of 32 and AMD wavefront size of 64.
- Add device (CUDA, Kokkos, and SYCL) backend for interleave/deinterleave
- Fix DirBndCondImpl.hpp for padding
- Fix HelmholtzImplShared.hpp for padding
- Fix HelmholtzSerialStdMat.hpp for padding
- Fix MassImpl.hpp for padding
- Fix AssmbScatrImplShared.hpp for padding
- Tidy MathKernels
- Tidy ConjGradImpl.hpp
- Tidy AssmbScatr operator implementation
- Remove Identity operators
- Remove Matrix operators
- Add CommDataType specialization based on MemoryRegion
- Initial multi-GPU implementation for ConjGrad
- Removed default argument value for alignment and vector width in Field and MemoryRegion object
Proposed solution
- Specify device vector width in NektarSpaces
class Serial
{
public:
using memory_space = NektarSpaces::HostSpace;
static constexpr size_t width = vec_t::width;
static constexpr size_t alignment = vec_t::alignment;
};
class AVX
{
public:
using memory_space = NektarSpaces::HostSpace;
static constexpr size_t width = vec_t::width;
static constexpr size_t alignment = vec_t::alignment;
};
class CUDA
{
public:
using memory_space = NektarSpaces::DeviceSpace;
static constexpr size_t width = 32;
static constexpr size_t alignment = __STDCPP_DEFAULT_NEW_ALIGNMENT__;
};
class HIP
{
public:
using memory_space = NektarSpaces::DeviceSpace;
static constexpr size_t width = 64;
static constexpr size_t alignment = __STDCPP_DEFAULT_NEW_ALIGNMENT__;
};
class SYCL
{
public:
using memory_space = NektarSpaces::DeviceSpace;
static constexpr size_t width = 64; //SYCLQueue::GetInstance().get_device().get_info<sycl::info::device::sub_group_sizes>();
static constexpr size_t alignment = __STDCPP_DEFAULT_NEW_ALIGNMENT__;
};
class KOKKOS
{
public:
using memory_space = NektarSpaces::DeviceSpace;
#if defined(KOKKOS_ENABLE_CUDA)
static constexpr size_t width = Kokkos::Impl::CudaTraits::WarpSize;
static constexpr size_t alignment = __STDCPP_DEFAULT_NEW_ALIGNMENT__;
#elif defined(KOKKOS_ENABLE_HIP)
static constexpr size_t width = Kokkos::Impl::HIPTraits::WarpSize;
static constexpr size_t alignment = __STDCPP_DEFAULT_NEW_ALIGNMENT__;
#elif defined(KOKKOS_ENABLE_SYCL)
static constexpr size_t width = 64;
static constexpr size_t alignment = __STDCPP_DEFAULT_NEW_ALIGNMENT__;
#else
static constexpr size_t width = vec_t::width;
static constexpr size_t alignment = __STDCPP_DEFAULT_NEW_ALIGNMENT__;
#endif
};
- Use backend-specific (instead of vec_t::width) vector width when calling GetBlockAttributes
GetBlockAttributes(FieldState::Coeff, expansionList, ExecSpace::width);
- Add CommDataType specialization based on MemoryRegion
namespace Nektar::LibUtilities
{
/**
* Partial specialisation for memory region
*/
template <class elemT> class CommDataTypeTraits<MemoryRegion<elemT>>
{
public:
static CommDataType &GetDataType()
{
return CommDataTypeTraits<elemT>::GetDataType();
}
static void *GetPointer(MemoryRegion<elemT> &val)
{
return val.template GetPtr<NektarSpaces::HostSpace, ReadWrite>();
}
static const void *GetPointer(const MemoryRegion<elemT> &val)
{
return val.template GetPtr<NektarSpaces::HostSpace, ReadWrite>();
}
static size_t GetCount(const MemoryRegion<elemT> &val)
{
return val.size();
}
const static bool IsVector = true;
};
} // namespace Nektar::LibUtilities
This allow MPI communicator to be called directly on MemoryRegion without copy to an Array
m_rowComm->AllReduce(m_vExchange, Nektar::LibUtilities::ReduceSum);
here, m_vExchange
is a MemoryRegion object.
Implementation
Tests
- FwdTrans working with 2 processors for Serial, Kokkos-Serial, and Kokkos-CUDA backend
- HelmSolve working with 2 processors for Serial, CUDA, Kokkos-Serial, and Kokkos-CUDA backend
Note: Current results suggest there are still some memory bugs somewhere in the code base but the basic infrastructure for MPI + GPU is there.
Suggested reviewers
Please suggest any people who would be appropriate to review your code.
Notes
Please add any other information that could be useful for reviewers.
Checklist
-
Functions and classes, or changes to them, are documented. [ ] User guide/documentation is updated.[ ] Changelog is updated.-
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).
Edited by Jacques Xing