Generalize mass matrix implementation to StdMat and CUDA
This MR generalizes the mass matrix operator from !17 (merged) to both StdMat and CUDA implementation. MatrixFree and SumFrac implementations to be added once IProductWRTBase and BwdTrans operators available.
Unit tests have also been added for add test for seg, quad, tri, hex, prism, pyr, and tet elements. Results are compared with the Nektar++ solution.
Edited by Jacques Xing