Draft: Execution model
Operator Checklist
Operator | Std(Mat) | SumFac | MatFree | Cuda |
---|---|---|---|---|
BwdTrans | ||||
PhysDeriv | ||||
IProductWRTBase | ||||
IProductWRTDeriv | ||||
Helmholtz | ||||
Identity | ||||
Mass | ||||
AssmbScatr | ||||
ConjGrad | ||||
FwdTrans | ||||
HelmSolve | ||||
NullPrecon | ||||
DiagPrecon | ||||
RobBndCond | ||||
NeuBndCond | ||||
DirBndCond | ||||
Matrix |
Adds execution model
Operators are templated on an execution model which determines the backend for the dependency graph which executes them.
The Cuda model wraps the cudaGraph functionality (requires cuda 10+)
The Thread model wraps a Chase-Lev work stealing queue using std::jthreads (requires C++20)
The Serial model wraps a std::queue
The Cuda execution model has a host sub-model which can be either Thread or Serial, allowing operations to occur on device and host simultaneously
Operators consist of a team of implementations, each of which acts on a block/chunk of elements. Each implementation has a node in the relevant graph - nodes are executed as soon as their parents are complete. An operator object can use different implementations on different blocks/chunks - or they can all be the same. Applying an Operator applies all its constituent implementations - applying an implementation submits its kernel to the graph. Launching the graph executes the nodes according to the dependency structure.
Operators and Implementations are registered by inheriting from a CRTP registrar class which registers their create methods in a factory. This allows the implementations access to the relevant operator scope (e.g. for its argument types)
template <ExecutionModel EM, typename TData>
class ExampleOperator final
: public Operator<EM, TData>::Registrar<ExampleOperator<EM, TData>>
{
public:
//Traits of operator
static constexpr OperatorType type = OperatorType::ExampleOperator;
static constexpr std::string_view name = "ExampleOperator";
static constexpr std::string_view desc =
"Example of an Operator";
//Argument types
typedef Cochain<TData, FieldState::Coeff> ArgIn;
typedef Cochain<TData, FieldState::Phys> ArgOut;
//Base for implementation
using BlockOp = BlockOperator<EM, TData, ArgIn, ArgOut>;
BwdTrans(GraphDetail<EM> &graph, Chain<TData> &chain)
: Operator<EM, TData>::template Registrar<BwdTrans<EM, TData>>(graph,
chain){};
//User functions
static std::unique_ptr<BwdTrans<EM, TData>> Create(GraphDetail<EM> &G,
Chain<TData> &C)
{
return Operator<EM, TData>::template Create<BwdTrans<EM, TData>>(G, C);
}
void Apply(ArgIn &I, ArgOut &O)
{
return Operator<EM, TData>::template Apply<BwdTrans<EM, TData>>(I, O);
}
//Implementation list
enum class ImplType
{
StdMat
};
static constexpr ImplType default_impl = ImplType::StdMat;
using Memory_t = MemoryRegion<TData>;
//Implementation definitions
class StdMat final
: public ExampleOperator<EM, TData>::template RegistrarImpl<StdMat>
{
using Base_t = ExampleOperator<EM, TData>::template RegistrarImpl<StdMat>;
using EMH = EM::HostExecution;
public:
static constexpr ImplType key = ImplType::StdMat;
static constexpr std::string_view name = "StdMat";
static constexpr std::string_view desc =
"Uses the standard matrix construction";
StdMat(GraphDetail<EM> &graph, ChainBlock<TData> &block,
HostDevice<EMH> &device);
static std::unique_ptr<BlockOp> create(GraphDetail<EM> &graph,
ChainBlock<TData> &block,
Device &device);
void apply(ArgIn &in, ArgOut &out) override final;
private:
HostDevice<EMH> &m_device;
HostMemory &m_resource = DefaultMemory;
std::unique_ptr<Memory_t> m_basis;
};
};