Tidy-up AssmScatr, NeuBndCond, and DirBndCond CUDA implementation

Issue/feature addressed

  • The AssmbScatr kernels are updated for coalesced memory access
  • CUDA grid size calculations are now encapsulated within kernel launchers. This will eventually allow grid size optimization for each individual kernel and for specific GPU device, if desired.
  • Eliminate duplicate host memory allocation in operator constructors. Nektar Array's are now used for temporary host memory storage. Memory is deallocated after construction.

Edited by Jacques Xing

