Implement ConjGrad, FwdTrans, and HelmSove operator for CUDA
This MR implement the CUDA version of the ConjGrad, FwdTrans, and HelmSove operators.
Note: This preliminary CUDA implementation is not efficient. To be fixed in future MRs
Note: Robin boundary condition not yet implemented.
Edited by Jacques Xing