Serial SumFac for all shapes based on vec_t (AVX) implementation
Issue/feature addressed
Setup the IProductWRTBase Operator for Serial execution using the SumFac implementation based on the AVX code by forcing vec_t to be a scalar type. Also bring this code into StdRegions for IProductWRTBase and IProductWRTDerivBase usage.
Proposed solution
Replaced the existing IProductWRTBase code using dgemm and dgemv code with the handwritten vec_t based loops. This has therefore extended the method to all element shape type where as previously it was only set up for Tris. Have also reused the inner kernel for all shapes in the legacy codes in StdRegions and LocalRegions.In the process of this last step have made the use of these routine more consistent over all shapes and ended up removing _sumfac_kernel() methods that only did the basis multiplication without Jacobian or integration weights (seems this variation was not widely needed and meant we did not need no jacobian vec_t loop versions)
Implementation
Required moving the inner kernel code into a new filename IProductWRTBaseAVXSumFacStdKernels which is now located in StdRegions/Operators directory. This also was not wrapped in a namepsace since we can then use an include within different namespaces so that the code can be compiled with AVX turned on and the operator use the interlaced version whilst the StdRegions version uses a scalar version. To allow this the code has to be wrapped in different namespaces.
In re-using this code in the legacy StdRegion Tri and StdTri have also introduced a new manager which stores used reused factors such as the 2/(1-z), (1 +2)/2 in the triangle, tet, prism, pyr cases and the W1, W2 weights that are repeatedly calculated. Note the vec_t operators are only being called within StdRegions since LocalRegions methods call back to these methods. This has however requires quite a generalised lowest level method IProductWRTBaseKernel() method that does take a jacobian input array and a Deformed bool within StdRegions. Although this last step slightly breaks the StdRegions definition it does keep the better structured in my opinion since otherwise would have to move the vec_t calls into LocalRegions whereas in this approach it is compactly kept together including Boost_PP directives.
So the structure of the IProductWRTBase routines now has the following structure
- A virtual function v_IProductWRTBase(inarray,outarray) method as before (strangely Segments has a version which took the base and has been removed)
- A kernel method IProductWRTBaseKernel(base0,base1, .., inarray, outarray, jac, deformed) which can be specialised by shape to take different number of basis and information about the collocation nature of the method for quads and hexs. When used in StdRegions this method sets the jacobian to one and the Deformed bool to false.
- A vritual function v_IProductWRTBaseKernel(base0,base1, .., inarray, outarray, jac, deformed, CollDir0, CollDir1, ..) method which has the same arguments depending on dimension (since it inherits from StdExpansion2D and StdExpnsion3D) that calls back to the method IProductWRTBaseKernel. This virtual function is useful in setting up calls from IProductWRTDerivBase method to that the same underlying code is reused.
- I have removed the method v_IProductWRTBase_SumFac_Kernel and variants which called the method without the jacobian or weights. In most places this was being used there was a call to multiply by quadrature weights beforehand so separating the method was rather redundant. There was a specific case of setting up a matrix for the compressible code called v_GenStdMatBwdDeriv which I could not find an alternative for but have pre-multiplied the input by the inverse of the std quadrature weights to make this equivalent. Might be a more elegant solution but this is only adding one division and then can reuse the same method calls.
I think after all of this there may be minimal (or no) calls to LocalRegions::MultiplyByQuadratureMetric and so may allow us to remove this method and will hopefully stop regular elements from storing the Jacobian at all quadrature points in the legacy code.
Tests
Pass existing tests but also add a series of variable p test cases for Tri, Quad, Tet, Prism Pyr and Hex. Have had to tweak a few test tolerances but this is probably not surprising since we are changing the execution of such a fundamental operation. The tweaking has just doubled (rather than increasing by magnitude) the tolerances
Suggested reviewers
Jacques, Alexandra, Chi Hin
Checklist
-
Functions and classes, or changes to them, are documented. -
[ ] User guide/documentation is updated.-
[ ] Changelog is updated.
-
-
Suitable tests added for new functionality. -
Contributed code is correctly formatted. (See the contributing guidelines). -
License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).