HDF5 File Schema
WESTPA stores all of its simulation data in the cross-platform, self-describing HDF5 file format. This file format can be read and written by a variety of languages and toolkits, including C/C++, Fortran, Python, Java, and Matlab so that analysis of weighted ensemble simulations is not tied to using the WESTPA framework. HDF5 files are organized like a filesystem, where arbitrarily-nested groups (i.e. directories) are used to organize datasets (i.e. files). The excellent HDFView program may be used to explore WEST data files.
The canonical file format reference for a given version of the WEST code is described in src/west/data_manager.py.
Overall structure
/
#ibstates/
index
naming
bstate_index
bstate_pcoord
istate_index
istate_pcoord
#tstates/
index
bin_topologies/
index
pickles
iterations/
iter_XXXXXXXX/\|iter_XXXXXXXX/
auxdata/
bin_target_counts
ibstates/
bstate_index
bstate_pcoord
istate_index
istate_pcoord
pcoord
seg_index
wtgraph
...
summary
The root group (/)
The root of the WEST HDF5 file contains the following entries (where a trailing “/” denotes a group):
Name |
Type |
Description |
---|---|---|
ibstates/ |
Group |
Initial and basis states for this simulation |
tstates/ |
Group |
Target (recycling) states for this simulation; may be empty |
bin_topologies/ |
Group |
Data pertaining to the binning scheme used in each iteration |
iterations/ |
Group |
Iteration data |
summary |
Dataset (1-dimensional, compound) |
Summary data by iteration |
The iteration summary table (/summary)
Field |
Description |
---|---|
n_particles |
the total number of walkers in this iteration |
norm |
total probability, for stability monitoring |
min_bin_prob |
smallest probability contained in a bin |
max_bin_prob |
largest probability contained in a bin |
min_seg_prob |
smallest probability carried by a walker |
max_seg_prob |
largest probability carried by a walker |
cputime |
total CPU time (in seconds) spent on propagation for this iteration |
walltime |
total wallclock time (in seconds) spent on this iteration |
binhash |
a hex string identifying the binning used in this iteration |
Per iteration data (/iterations/iter_XXXXXXXX)
Data for each iteration is stored in its own group, named according to the
iteration number and zero-padded out to 8 digits, as in
/iterations/iter_00000001
for iteration 1. This is done solely for
convenience in dealing with the data in external utilities that sort output by
group name lexicographically. The field width is in fact configurable via the
iter_prec
configuration entry under data
section of the WESTPA
configuration file.
The HDF5 group for each iteration contains the following elements:
Name |
Type |
Description |
---|---|---|
auxdata/ |
Group |
All user-defined auxiliary data0 sets |
bin_target_counts |
Dataset (1-dimensional) |
The per-bin target count for the iteration |
ibstates/ |
Group |
Initial and basis state data for the iteration |
pcoord |
Dataset (3-dimensional) |
Progress coordinate data for the iteration stored as a (num of segments, pcoord_len, pcoord_ndim) array |
seg_index |
Dataset (1-dimensional, compound) |
Summary data for each segment |
wtgraph |
Dataset (1-dimensional) |
The segment summary table (/iterations/iter_XXXXXXXX/seg_index)
Field |
Description |
---|---|
weight |
Segment weight |
parent_id |
Index of parent |
wtg_n_parents |
|
wtg_offset |
|
cputime |
Total cpu time required to run the segment |
walltime |
Total walltime required to run the segment |
endpoint_type |
|
status |
Bin Topologies group (/bin_topologies)
Bin topologies used during a WE simulation are stored as a unique hash
identifier and a serialized BinMapper
object in python pickle format. This group contains
two datasets:
index
: Compound array containing the bin hash and pickle lengthpickle
: The pickledBinMapper
objects for each unique mapper stored in a (num unique mappers, max pickled size) array