Tom Gordon requested to merge twg21/nektar:feature/performancemonitoring into master Aug 09, 2023

Issue/feature addressed

The CI system currently has no way of checking whether performance has regressed in a new revision, which may lead to an unnoticed gradual slowing of execution times.

Proposed solution

A new execution time test metric which reads the execution time from an output file (e.g. from a solver) and compares it to an accepted time and tolerance for a given computer
Support for multiple runs of the same test, in order to measure the (more robust) average execution time
New build option to toggle performance tests

Implementation

Support for multiple runs of the same test

Tester.cpp now supports multiple runs of the same test. The number of runs is indicated by a new attribute in the test element of .tst files: <test runs="5">
Instead of creating one working directory, Tester.cpp now creates one "master" directory which contains a working directory for each run
The .out and .err files from each run are appended to new master.out and master.err files respectively, which are held in the master folder
master.out and master.err are now passed to the test metrics instead of a single output
Most metrics don't support reading multiple values of the same variable to take an average, and will take the first value (such as MetricL2).
If a metric doesn't support averaging test data (MetricExecutionTime is currently the only one), Tester.cpp outputs an error warning the user a metric may not behave as intended.
Whether or not a metric supports averaging is checked via a new metric member variable (default: false) and function in the Metric base class:

protected:
    bool m_average = false;

public:
    bool SupportsAverage() const
    {
        return m_average;
    }

Execution time metric (MetricExecutionTime)

MetricExecutionTime is a derived class from Metric, with type "EXECUTIONTIME"
It looks through all of the accepted execution times, until the attribute "hostname" matches the hostname of the runner. It will add this as a match if it does.
If there are no hostnames provided, a skip flag is passed to v_Test. The test will pass successfully but output a warning, which tells the user their execution time and hostname so they can easily add it to the test file if desired.

New build option NEKTAR_BUILD_PERFORMANCE_TESTS

Defaults to OFF -- it has limited uses for end users
Logic in the root CMakeLists file has been updated so test directory, etc is created if NEKTAR_BUILD_TESTS OR NEKTAR_BUILD_PERFORMANCE_TESTS are true (both use ctest)

Tests

N/A

Suggested reviewers

@ccantwel (my UROP supervisor)

Notes

While the code works, it needs to be properly integrated into the CI system before it's useful
Needs a runner with low traffic and ideally a fixed hostname (but other ways of identifying a computer could be explored, e.g. a hardware ID)
A test file "Perf_ChanFlow_3DH1D_pRef" is included but deliberately hasn't been added as a test yet (runner hasn't been decided)

Checklist

Functions and classes, or changes to them, are documented.
User guide/documentation is updated.
Changelog is updated.
Contributed code is correctly formatted. (See the contributing guidelines).
License added to any new files.
No extraneous files have been added (e.g. compiler output or test data files).

Edited Sep 28, 2023 by Chris Cantwell

Added new execution time test metric for testing performance, added support for multiple test runs in Tester.cpp