This MR implement the CUDA version of the DiagPrecon operator.
The StdMat version of the DiagPrecon operator has also been updated to use the redesign AssmbScat operator.
Note: The current CUDA implementation of the DiagPrecon operator apply the preconditioner of the GPU but the preconditioner is still computed on the CPU during the initialization (to be updated).