Skip to content

Draft: Store History Points in HDF5 format

Issue/feature addressed

The history point filter always saves data in a custom CSV format. If large amounts of data should be exported, this creates several problems:

  1. The I/O becomes very slow
  2. The amount of data stored to disk becomes excessively large
  3. The accuracy of the data that is written to the file is pre-determined by the way the data is formatted, i.e., by the number of decimals added to each number if the CSV file.

In addition to this, the custom CSV format can't be read by standard readers such as numpy.genfromtxt(). This makes it difficult to post process the data that is obtained from the history point filter.

Proposed solution

A new class called PtsIOHdf5 has been implemented. This class is designed to write history point data to a HDF5 file. That is, if the user has a set of coordinates with associated data, this class can be used to write this data to file. By using the HDF5 format, we get several benefits:

  1. A single file that contains all time steps can be created. If the simulation is restarted, this file can still be appended to.
  2. I/O becomes significantly faster
  3. The size of the data written to disk is significantly smaller
  4. Data can be easily opened and post processed in a variety of software, including Python, Matlab, and Julia

Implementation

The PtsIOHdf5 class is quite simple. It creates a HDF5 file with the following layout:

/NEKTAR
    COORDINATES/
        x: [...]
        y: [...]
        z: [...]
    TIME-DATA/
        TIME-00000001/
            time: t
            field1: [...]
            field2: [...]
            ...
            fieldN: [...]
        TIME-00000002/
            time: t
            field1: [...]
            field2: [...]
            ...
            fieldN: [...]
        ...

Here, NEKTAR, COORDINATES, TIME-DATA and TIME-0000000N are "HDF5 groups", i.e., folders inside the HDF5 file that can be used to structure data in a hierarchical way. For each time step, a new HDF5 group is added inside the TIME-DATA group, called TIME-0000000N. Inside this group, HDF5 datasets are added, one for each field that should be exported. A HDF5 attribute called "time" is also added to the group, and it contains the value of the current time step.

For now, simple HDF5 datasets that don't support parallel writing are added. This means that all data must first be communicated to the root process, and then written by the root process. We can look at extending this capability later, but for now, I decided to make keep it simple.

The new class for writing HDF5 data was added to the History Points filter to demonstrate its use.

Tests

Notes

Please add any other information that could be useful for reviewers.

Checklist

  • Functions and classes, or changes to them, are documented.
  • User guide/documentation is updated.
  • Changelog is updated.
  • Suitable tests added for new functionality.
  • Contributed code is correctly formatted. (See the contributing guidelines).
  • License added to any new files.
  • No extraneous files have been added (e.g. compiler output or test data files).

Warning

On the 19.07 the code formatting (code style) was standardised using clang-format, over the whole Nektar++ code. This means changes in your branch will conflict with formatting changes on the master branch. To resolve these conflicts , see #295 (closed)

Edited by Daniel Lindblad

Merge request reports