HDF5 File Schema

WESTPA stores all of its simulation data in the cross-platform, self-describing HDF5 file format. This file format can be read and written by a variety of languages and toolkits, including C/C++, Fortran, Python, Java, and Matlab so that analysis of weighted ensemble simulations is not tied to using the WESTPA framework. HDF5 files are organized like a filesystem, where arbitrarily-nested groups (i.e. directories) are used to organize datasets (i.e. files). The excellent HDFView program may be used to explore WEST data files.

The canonical file format reference for a given version of the WEST code is described in src/west/data_manager.py.

Overall structure

/
    #ibstates/
        index
        naming
            bstate_index
            bstate_pcoord
            istate_index
            istate_pcoord
    #tstates/
        index
    bin_topologies/
        index
        pickles
    iterations/
        iter_XXXXXXXX/\|iter_XXXXXXXX/
            auxdata/
            bin_target_counts
            ibstates/
                bstate_index
                bstate_pcoord
                istate_index
                istate_pcoord
            pcoord
            seg_index
            wtgraph
        ...
    summary

The root group (/)

The root of the WEST HDF5 file contains the following entries (where a trailing “/” denotes a group):

Name	Type	Description
ibstates/	Group	Initial and basis states for this simulation
tstates/	Group	Target (recycling) states for this simulation; may be empty
bin_topologies/	Group	Data pertaining to the binning scheme used in each iteration
iterations/	Group	Iteration data
summary	Dataset (1-dimensional, compound)	Summary data by iteration

The iteration summary table (/summary)

Field	Description
n_particles	the total number of walkers in this iteration
norm	total probability, for stability monitoring
min_bin_prob	smallest probability contained in a bin
max_bin_prob	largest probability contained in a bin
min_seg_prob	smallest probability carried by a walker
max_seg_prob	largest probability carried by a walker
cputime	total CPU time (in seconds) spent on propagation for this iteration
walltime	total wallclock time (in seconds) spent on this iteration
binhash	a hex string identifying the binning used in this iteration

Per iteration data (/iterations/iter_XXXXXXXX)

Data for each iteration is stored in its own group, named according to the iteration number and zero-padded out to 8 digits, as in /iterations/iter_00000001 for iteration 1. This is done solely for convenience in dealing with the data in external utilities that sort output by group name lexicographically. The field width is in fact configurable via the iter_prec configuration entry under data section of the WESTPA configuration file.

The HDF5 group for each iteration contains the following elements:

Name	Type	Description
auxdata/	Group	All user-defined auxiliary data0 sets
bin_target_counts	Dataset (1-dimensional)	The per-bin target count for the iteration
ibstates/	Group	Initial and basis state data for the iteration
pcoord	Dataset (3-dimensional)	Progress coordinate data for the iteration stored as a (num of segments, pcoord_len, pcoord_ndim) array
seg_index	Dataset (1-dimensional, compound)	Summary data for each segment
wtgraph	Dataset (1-dimensional)

The segment summary table (/iterations/iter_XXXXXXXX/seg_index)

Field	Description
weight	Segment weight
parent_id	Index of parent
wtg_n_parents
wtg_offset
cputime	Total cpu time required to run the segment
walltime	Total walltime required to run the segment
endpoint_type
status

Bin Topologies group (/bin_topologies)

Bin topologies used during a WE simulation are stored as a unique hash identifier and a serialized BinMapper object in python pickle format. This group contains two datasets:

index: Compound array containing the bin hash and pickle length
pickle: The pickled BinMapper objects for each unique mapper stored in a (num unique mappers, max pickled size) array