Lower memory footprint when reading field files in HDF5 format for massively parallel runs
Issue/feature addressed
The HDF5 format for field files (.fld
files) stores the data in decompositions. Each decomposition represents the data written by one parallel process. Therefore, if the field file was written by 10 processes, then there will be 10 decompositions. When the field file is read using FieldIOHdf5::v_Import()
, each process checks which decompositions that it needs to read, and then reads the entire decomposition. Later, the data is filtered such that only elements which are actually needed by the process are retained. If the number of processes that wrote the field file is in the same order as the number of processes that reads it, this strategy works quite well. If, on the other hand, a few processes have written the file, and a very large number of processes reads the file, we run into problems. This is because all processes essentially will read all the data. Hence, the memory consumption scales proportionally to the number of parallel processes that reads the field file. This causes memory problems in massively parallel runs, especially if the amount of memory per node is rather limited, and the field file contains a very large number of degrees of freedom.
Proposed solution
To solve the above issue, we propose to only read the data that is needed by the process, rather than all the data in one decomposition. In the current file format, the entire solution is stored in a large flat array. This array does in turn contain all the polynomial coefficients for all the elements and variables stored. As noted previously, the data is currently divided into decompositions (by providing offset metadata). Each decomposition does in turn contain the modal data (polynomial coefficients) for each element in that decomposition. If several variables are used (such as the compressible flow solver), then the decomposition will also contain data for each variable. This gives a total of N_variables * N_coeffs_per_element * N_elements_per_decomposition floating point numbers per decomposition. To avoid reading the entire array, which will be quite large if the file was written by a few processes, we use a feature in HDF5 where you only read a selected set of indices. To figure out which indices we should read, we look at the array which the user provides to v_Import()
, called ElementIDs
. Previously, this away was only used to decide which decompositions to read, but not to select the elements inside each decomposition to read. In this MR, we solve this issue. This requires us to figure out the offsets into the flat array where the element data is stored. To do this, we first compute the number of modes in each element inside each decomposition. This is done by making a small modification to the routine FieldIO::CheckFieldDefinition
. Once we know how many modes (polynomial coefficients) that are stored inside each element, we can easily compute the offset. By using the offsets, we can create an array with indices to read. This array is then passed to the HDF5 routines.
Implementation
Most changes have been added to FieldIOHdf5::v_Import()
and FieldIO::CheckFieldDefinition
. In the first routine, the main change is that we have added an if-statement that checks if the user provided an array of element IDs that should be read. If this is the case, then the number of modes in each element is computed using FieldIO::CheckFieldDefinition
. This requires that the user passes a non-const array to this routine (this was not done before, so it was added), where the number of modes per element is added for each element in the decomposition. Once the number of modes per element is known, it is a matter of computing some offsets, and then read the data. The code that does this should be self-explanatory.
A couple of final notes should be made:
- We currently assume that each variable is stored with the same polynomial order. This might have to be fixed
- If we think that the updated version of
FieldIO::CheckFieldDefinition
is good, we can simplify this routine because we no longer need to differentiate between uniform and variable polynomial order in the new implementation. - After we did the filtering, we update the local variable
fielddef->m_elementIDs
. However, we don't update the remaining metadata stored infielddef
. It might be necessary to do this, especially if a variable polynomial order is used (see comment). - Before merging, we need to do some cleanup of the code, especially remove redundant parameters to some functions.
Tests
TBD
Notes
Please add any other information that could be useful for reviewers.
Checklist
-
Functions and classes, or changes to them, are documented. -
[ ] User guide/documentation is updated. -
Changelog is updated. -
Suitable tests added for new functionality. -
[ ] Newly added files are correctly formatted. -
[ ] License added to any new files. -
No extraneous files have been added (e.g. compiler output or test data files).