Commit 125b412e authored by Spencer Sherwin's avatar Spencer Sherwin

Merge branch 'master' into feature/SumFacPyr

parents f265759f 236cb5f9
......@@ -3,3 +3,8 @@
path = docs/tutorial
url =
ignore = all
[submodule "docs/developer-guide"]
branch = master
path = docs/developer-guide
url =
ignore = all
......@@ -4,11 +4,32 @@ Changelog
- Add periodic boundary condition meshing in 2D (!733)
- Adjust boundary layer thickness in corners in 2D (!739)
- Added in sum factorisation version for pyramid expnasions and orthogonal expansion in pyramids (!750)
- Added the developer-guide repository as a submodule (!751)
- Remove the duplicate output of errorutil (!756)
- Fix issue with FieldConvert when range flag used (!761)
- Fix memory consumption issue with Gmsh output (!747, !762)
- Rework meshing control so that if possible viewable meshes will be dumped
when some part of the system fails (!756)
- Add manifold meshing option (!756)
- Fix issue with field ordering in the interppointdatatofld module (!754)
......@@ -50,6 +71,7 @@ v4.4.0
- Fix bug in FieldUtils when using half mode expansions (!734)
- Do not read the same fld/pts files again for every variable (!670)
- Fix bug in CMake PETSc detection for Ubuntu 16.04/Debian 9 (!735)
- Fix warnings with Intel compiler (!742)
- Add a projection equation system for C^0 projections (!675)
......@@ -108,9 +130,9 @@ v4.4.0
- Change variable names in mcf file to make more sense (!736)
- Fix issues in varopti module so that in can be compiled without meshgen on
- Replace LAPACK Eigenvalue calculation with handwritten function in
- Replace LAPACK Eigenvalue calculation with handwritten function in
varopti (!738)
- Improved node-colouring algorithm for better load-balancing
- Improved node-colouring algorithm for better load-balancing
in varopti (!738)
- Simplified calculation of the energy functional in varopti for improved
performance (!738)
IF (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/developer-guide/CMakeLists.txt)
Subproject commit e128cfaffbbd37c734a667cdc2a07b6f06291615
SET(DEVGUIDE ${CMAKE_BINARY_DIR}/docs/developer-guide)
${HTLATEX} ${DEVGUIDESRC}/developer-guide.tex
# If tex4ht successful, create img dir and copy images across
FILE(GLOB_RECURSE imgfiles "img/*.png" "img/*.jpg" "*/img/*.png" "*/img/*.jpg")
ADD_CUSTOM_COMMAND(TARGET developer-guide-html
POST_BUILD COMMAND ${CMAKE_COMMAND} -E make_directory ${DEVGUIDE}/html/img)
FOREACH(img ${imgfiles})
ADD_CUSTOM_COMMAND(TARGET developer-guide-html
FILE(GLOB_RECURSE pdffiles "*/img/*.pdf")
FOREACH(pdf ${pdffiles})
ADD_CUSTOM_COMMAND(TARGET developer-guide-html
${CONVERT} ${pdf} ${DEVGUIDE}/html/img/${BASENAME}.png)
${PDFLATEX} --output-directory ${DEVGUIDE} ${DEVGUIDESRC}/developer-guide.tex
${PDFLATEX} --output-directory ${DEVGUIDE} ${DEVGUIDESRC}/developer-guide.tex
${PDFLATEX} --output-directory ${DEVGUIDE} ${DEVGUIDESRC}/developer-guide.tex
\chapter{Coding Standard}
The purpose of this page is to detail the coding standards of the project which
all contributers are requested to follow.
This page describes the coding style standard for C++. A coding style standard
defines the visual layout of source code. Presenting source code in a uniform
fashion facilitates the use of code by different developers. In addition,
following a standard prevents certain types of coding errors.
All of the items below, unless otherwise noted, are guidelines. They are
recommendations about how to lay out a given block of code. Use common sense and
provide comments to describe any deviation from the standard. Sometimes,
violating a guideline may actually improve readability.
If you are working with code that does not follow the standard, bring the code
up-to-date or follow the existing style. Don’t mix styles.
\section{Code Layout}
The aim here is to maximise readability on all platforms and editors.
\item Code width of 80 characters maximum - hard-wrap longer lines.
\item Use sensible wrapping for long statements in a way which maximises
\item Do not put multiple statements on the same line.
\item Do not declare multiple variables on the same line.
\item Provide a default value on all variable declarations.
\item Enclose every program block (if, else, for, while, etc) in braces, even if
empty or just a single line.
\item Opening braces (\{) should be on their own line.
\item Braces at same indentation as preceeding statement.
\item One class per .cpp and .h file only, unless nested.
\item Define member functions in the .cpp file in the same order as defined in
the .h file.
\item Templated classes defined and implemented in a single .hpp file.
\item Do not put inline functions in the header file unless the function is
trivial (e.g. accessor, empty destructor), or profiling explicitly suggests to.
\item Inline functions should be declared within the class declaration but
defined outside the class declaration at the bottom of the header file.
Virtual and inline are mutually exclusive. Virtual functions should therefore be
implemented in the .cpp file.
Adding an appropriate amount of white space enhances readability. Too much white
space, on the other hand, detracts from that readability.
\item Indent using a four-space tab. Consistent tab spacing is necessary to
maintain formatting. Note that this means when a tab is pressed, four physical spaces are
inserted into the source instead.
\item Put a blank line at the end of a public/protected/private block.
\item Put a blank line at the end of every file.
\item Put a space after every keyword (if, while, for, etc.).
\item Put a space after every comma, unless the comma is at the end of the line.
\item Do not put a space before the opening parenthesis of an argument list to a
\item Declare pointers and references with the * or \& symbol next to the
declarator, not the type; e.g., Object *object. Do not put multiple variables in the same
\item Place a space on both sides of a binary operator.
\item Do not use a space to separate a unary operator from its operand.
\item Place open and close braces on their own line. No executable statements
should appear on the line with the brace, but comments are allowed. Indent opening
braces at the same level as the statement above and indent the closing brace at
the same level as the corresponding opening brace.
\item Indent all statements following an open brace by one tab. Developer Studio
puts any specifier terminated with a colon at the same indentation level as the
enclosing brace. Examples of such specifiers include case statements, access
specifiers (public, private, protected), and goto labels. This is not acceptable
and should be manually corrected so that all statements appearing within a block
and delineated by braces are indented.
\item Break a line into multiple lines when it becomes too long to read. Use at
least two tabs to start the new line, so it does not look like the start of a
\item Follow C++ style comments with one space. It is also preferable to
consider any text that follows C++ style comments as a sentence and to begin this text
with a capital letter. This helps to distinguish the line from a continuation of
a previous line; i.e., \inlsh{// This is my comment.}
\item As a general rule, don’t keep commented out source code in the final
baselined product. Such code leads the reader to believe there was uncertainty
in the code as it currently exists.
\item Place the \# of a preprocessor directive at column one. An exception is
the use of nested ifdefs where the bodies only contain other preprocessor directives.
Add tabs to enhance readability:
void foo() {
for(int i = 0; i < 10; ++i)
#ifdef BAR
\item Use tabular white space if it enhances
\item Use only one return statement. Structure the code so that only one return
statement is necessary.
\section{Naming Conventions}
Keep variable and function names meaningful but concise.
\item Begin variable names with lower-case letter.
\item Begin function names and class names with upper-case letter.
\item All function, class and variable names should be written in CamelCase,
e.g. \inlsh{MyClass, DoFunction() or myVariableName}.
\item All preprocessor definitions written in UPPER\_CASE with words separated
by underscores, e.g. USE\_SPECIFIC\_FEATURE.
\item All member variables prefixed with m\_.
\item All constants prefixed with a k.
\item All function parameters prefixed with a p.
\item All enumerations prefixed with an e.
\item Do not use leading underscores.
The top-level namespace is "Nektar". All code should reside in this namespace or
a sub-space of this.
\item Namespaces correspond to code structure.
\item Namespaces should be kept to a minimum to simplify the interface to their
\item Briefs for classes, functions and types in header files using
\inlsh{///} notation.
\item Full documentation with implementation using \inlsh{/** ... *\/}
\item Use @ symbol for @class, @param, @returns, etc for ease of identification.
\item Any separate documentation pages not directly associated with a portion of
the code should be in a separate file in /docs/html/doxygen.
\chapter{Core Concepts}
This section describes some of the key concepts which are useful when developing
code within the Nektar++ framework.
\section{Factory method pattern}
The factory method pattern is used extensively throughout Nektar++ as a
mechanism to instantiate objects. It provides the following benefits:
\item Encourages modularisation of code such that conceptually related
algorithms are grouped together
\item Structuring of code such that different implementations of the same
concept are encapsulated and share a common interface
\item Users of a factory-instantiated modules need only be concerned with the
interface and not the details of underlying implementations
\item Simplifies debugging since code relating to a specific implementation
resides in a single class
\item The code is naturally decoupled to reduce header-file dependencies and
improves compile times
\item Enables implementations (e.g. relating to third-party libraries) to be
disabled through the build process (CMake) by not compiling a specific
implementation, rather than scattering preprocessing statements throughout the
For conceptual details see the Wikipedia page.
\subsection{Using NekFactory}
The templated NekFactory class implements the factory pattern in Nektar++.
There are two distinct aspects to creating a factory-instantiated collection of
classes: defining the public interface, and registering specific
implementations. Both of these involve adding standard boilerplate code. It is
assumed that we are writing a code which implements a particular concept or
functionality within the code, for which there are multiple implementations. The
reasons for multiple implementations may be very low level such as alternative
algorithms for solving a linear system, or high level, such as selecting from a
range of PDEs to solve.
\subsubsection{Creating an interface (base class)}
A base class must be defined which prescribes an implementation-independent
interface. In Nektar++, the template method pattern is used, requiring public
interface functions to be defined which call private virtual implementation
methods. The latter will be overridden in the specific implementation classes.
In the base class these virtual methods should be defined as pure virtual, since
there is no implementation and we will not be instantiating this base class
As an example we will create a factory for instantiating different
implementations of some concept \inlsh{MyConcept}, defined in
\inlsh{MyConcept.h} and \inlsh{MyConcept.cpp}. First in \inlsh{MyConcept.h},
we need to include the NekFactory header
#include <LibUtilities/BasicUtils/NekFactory.hpp>
The following code should then be included just before the base class
declaration (in the same namespace as the class):
class MyConcept
// Datatype for the MyConcept factory
typedef LibUtilities::NekFactory< std::string, MyConcept,
ParamType2 > MyConceptFactory;
MyConceptFactory& GetMyConceptFactory();
The template parameters define the datatype of the key used to retrieve a
particular implementation (usually a string, enum or custom class such as
\inlsh{MyConceptKey}, the base class (in our case \inlsh{MyConcept} and a list
of zero or more parameters which are taken by the constructors of all
implementations of the type \inlsh{MyConcept} (in our case we have two). Note
that all implementations must take the same parameter list in their constructors.
The normal definition of our base class then follows:
class MyConcept
MyConcept(ParamType1 p1, ParamType2 p2);
We must also define a shared pointer for our base class for use later
typedef boost::shared_ptr<MyConcept> MyConceptShPtr;
\subsubsection{Creating a specific implementation (derived class)}
A class is defined for each specific implementation of a concept. It is these
specific implementations which are instantiated by the factory.
In our example we will have an implementations called \inlsh{MyConceptImpl1}
defined in \inlsh{MyConceptImpl1.h} and \inlsh{MyConceptImpl1.cpp}. In the
header file we include the base class header file
#include <Subdir/MyConcept.h>
We then define the derived class as normal:
class MyConceptImpl1 : public MyConcept
In order for the factory to work, it must know
\item that {{{MyConceptImpl1}}} exists, and
\item how to create it.
To allow the factory to create instances of our class we define a function in
our class:
/// Creates an instance of this class
static MyConceptSharedPtr create(
ParamType1 p1,
ParamType2 p2)
return MemoryManager<MyConceptImpl1>::AllocateSharedPtr(p1, p2);
This function simply creates an instance of \inlsh{MyConceptImpl1} using the
supplied parameters. It must be \inlsh{static} because we are not operating on
an existing instance and it should return a base class shared pointer (rather
than a \inlsh{MyConceptImpl1} shared pointer), since the point of the factory
is that the calling code does not know about specific implementations.
The last task is to register our implementation with the factory. This is done
using the \inlsh{RegisterCreatorFunction} member function of the factory.
However, we wish this to happen as early on as possible (so we can use the
factory straight away) and without needing to explicitly call the function for
every implementation at the beginning of our program (since this would again
defeat the point of a factory)! The solution is to use the function to
initialise a static variable: it will be executed prior to the start of the
\inlsh{main()} routine, and can be located within the very class it is
registering, satisfying our code decoupling requirements.
In \inlsh{MyConceptImpl1.h} we define a static variable with the same datatype
as the key used in our factory (in our case \inlsh{std::string})
static std::string className;
The above variable can be \inlsh{private} since it is typically never actually
used within the code. We then initialise it in \inlsh{MyConceptImpl1.cpp}
string MyConceptImpl1::className
= GetMyConceptFactory().RegisterCreatorFunction(
"First implementation of my concept.");
The first parameter specifies the value of the key which should be used to
select this implementation. The second parameter is a function pointer to our
static function used to instantiate our class. The third parameter provides a
description which can be printed when listing the available MyConcept
\subsection{Instantiating classes}
To create instances of MyConcept implementations elsewhere in the code, we must
first include the ''base class'' header file
#include <Subdir/MyConcept.h>
Note we do not include the header files for the specific MyConcept
implementations anywhere in the code (apart from \inlsh{MyConceptImpl1.cpp}).
If we modify the implementation, only the implementation itself requires
recompiling and the executable relinking.
We create an instance by retrieving the \inlsh{MyConceptFactory} and call the
\inlsh{CreateInstance} member function of the factory:
ParamType p1 = ...;
ParamType p2 = ...;
MyConceptShPtr p = GetMyConceptFactory().CreateInstance( "Impl1", p1, p2 );
Note that the class is used through the pointer \inlsh{p}, which is of type
\inlsh{MyConceptShPtr}, allowing the use of any of the public interface
functions in the base class (and therefore the specific implementations behind them) to be
called, but not directly any functions declared solely in a specific
An Array is a thin wrapper around native arrays. Arrays provide all the
functionality of native arrays, with the additional benefits of automatic use of
the Nektar++ memory pool, automatic memory allocation and deallocation, bounds
checking in debug mode, and easier to use multi-dimensional arrays.
Arrays are templated to allow compile-time customization of its dimensionality
and data type.
\item \inltt{Dim} Must be a type with a static unsigned integer called
\inltt{Value} that specifies the array's dimensionality. For example
struct TenD {
static unsigned int Value = 10;
\item \inltt{DataType} The type of data to store in the array.
It is often useful to create a class member Array that is shared with users of
the object without letting the users modify the array. To allow this behavior,
Array<Dim, !DataType> inherits from Array<Dim, const !DataType>. The following
example shows what is possible using this approach:
class Sample {
Array<OneD, const double>& getData() const { return m_data; }
void getData(Array<OneD, const double>& out) const { out = m_data; }
Array<OneD, double> m_data;
In this example, each instance of Sample contains an array. The getData
method gives the user access to the array values, but does not allow
modification of those values.
\subsection{Efficiency Considerations}
Tracking memory so it is deallocated only when no more Arrays reference it does
introduce overhead when copying and assigning Arrays. In most cases this loss of
efficiency is not noticeable. There are some cases, however, where it can cause
a significant performance penalty (such as in tight inner loops). If needed,
Arrays allow access to the C-style array through the \texttt{Array::data} member
Threading is not currently included in the main code distribution. However, this
hybrid MPI/pthread functionality should be available within the next few months.
We investigated adding threaded parallelism to the already MPI parallel
Nektar++. MPI parallelism has multiple processes that exchange data using
network or network-like communications. Each process retains its own memory
space and cannot affect any other process’s memory space except through the MPI
API. A thread, on the other hand, is a separately scheduled set of instructions
that still resides within a single process’s memory space. Therefore threads
can communicate with one another simply by directly altering the process’s
memory space. The project's goal was to attempt to utilise this difference to
speed up communications in parallel code.
A design decision was made to add threading in an implementation independent
fashion. This was achieved by using the standard factory methods which
instantiate an abstract thread manager, which is then implemented by a concrete
class. For the reference implementation it was decided to use the Boost library
rather than native p-threads because Nektar++ already depends on the Boost
libraries, and Boost implements threading in terms of p-threads anyway.
It was decided that the best approach would be to use a thread pool. This
resulted in the abstract classes ThreadManager and ThreadJob. ThreadManager is
a singleton class and provides an interface for the Nektar++ programmer to
start, control, and interact with threads. ThreadJob has only one method, the
virtual method run(). Subclasses of ThreadJob must override run() and provide a
suitable constructor. Instances of these subclasses are then handed to the
ThreadManager which dispatches them to the running threads. Many thousands of
ThreadJobs may be queued up with the ThreadManager and strategies may be
selected by which the running threads take jobs from the queue. Synchronisation
methods are also provided within the ThreadManager such as wait(), which waits
for the thread queue to become empty, and hold(), which pauses a thread that
calls it until all the threads have called hold(). The API was thoroughly
documented in Nektar++’s existing Javadoc style.
Classes were then written for a concrete implementation of ThreadManager using
the Boost library. Boost has the advantage of being available on all Nektar++’s
supported platforms. It would not be difficult, however, to implement
ThreadManager using some other functionality, such as native p-threads.
Two approaches to utilising these thread classes were then investigated. The
bottom-up approach identifies likely regions of the code for parallelisation,
usually loops around a simple and independent operation. The top-down approach
seeks to run as much of the code as is possible within a threaded environment.
The former approach was investigated first due to its ease of implementation.
The operation chosen was the multiplication of a very large sparse block
diagonal matrix with a vector, where the matrix is stored as its many smaller
sub matrices. The original algorithm iterated over the sub matrices multiplying
each by the vector and accumulating the result. The new parallel algorithm
sends ThreadJobs consisting of batches of sub matrices to the thread pool. The
worker threads pick up the ThreadJobs and iterate over the sub matrices in the
job accumulating the result in a thread specific result vector. This latter
detail helps to avoid the problem of cache ping-pong which is where multiple
threads try to write to the same memory location, repeatedly invalidating one
another's caches.
Clearly this approach will work best when the sub matrices are large and there
are many of them . However, even for test cases that would be considered large
it became clear that the code was still spending too much time in its scalar
This led to the investigation of the top-down approach. Here the intent is to
run as much of the code as possible in multiple threads. This is a much more
complicated approach as it requires that the overall problem can be partitioned
suitably, that a mechanism be available to exchange data between the threads,
and that any code using shared resources be thread safe. As Nektar++ already
has MPI parallelism the first two requirements (data partitioning and exchange)
are already largely met. However since MPI parallelism is implemented by having
multiple independent processes that do not share memory space, global data in
the Nektar++ code, such as class static members or singleton instances, are now
vulnerable to change by all the threads running in a process.
To Nektar++’s communication class, Comm, was added a new class, ThreadedComm.
This class encapsulates a Comm object and provides extra functionality without
altering the API of Comm (this is the Decorator pattern). To the rest of the
Nektar++ library this Comm object behaves the same whether it is a purely MPI
Comm object or a hybrid threading plus MPI object. The existing data
partitioning code can be used with very little modification and the parts of the
Nektar++ library that exchange data are unchanged. When a call is made to
exchange data with other workers ThreadedComm first has the master thread on
each process (i.e. the first thread) use the encapsulated Comm object (typically
an MPI object) to exchange the necessary data between the other processes, and
then exchanges data with the local threads using direct memory to memory copies.
As an example: take the situation where there are two processes A and B,
possibly running on different computers, each with two threads 1 and 2. A
typical data exchange in Nektar++ uses the Comm method AllToAll(...) in which
each worker sends data to each of the other workers. Thread A1 will send data
from itself and thread A2 via the embedded MPI Comm to thread B1, receiving in
turn data from threads B1 and B2. Each thread will then pick up the data it
needs from the master thread on its process using direct memory to memory
copies. Compared to the situation where there are four MPI processes the number
of communications that actually pass over the network is reduced. Even MPI
implementations that are clever enough to recognise when processes are on the
same host must make a system call to transfer data between processes.
The code was then audited for situations where threads would be attempting to
modify global data. Where possible such situations were refactored so that each
thread has a copy of the global data. Where the original design of Nektar++ did
not permit this access to global data was mediated through locking and
synchronisation. This latter approach is not favoured except for global data
that is used infrequently because locking reduces concurrency.
The code has been tested and Imperial College cluster cx1 and has shown good
scaling. However it is not yet clear that the threading approach outperforms
the MPI approach; it is possible that the speedups gained through avoiding
network operations are lost due to locking and synchronisation issues. These
losses could be mitigated through more in-depth refactoring of Nektar++.
\ No newline at end of file
The typical elemental decomposition of the spectral/hp element method requires a
global assembly process when considering multi-elemental problems. This global