HDF5 File Schema

WESTPA stores all of its simulation data in the cross-platform, self-describing HDF5 file format. This file format can be read and written by a variety of languages and toolkits, including C/C++, Fortran, Python, Java, and Matlab so that analysis of weighted ensemble simulations is not tied to using the WESTPA framework. HDF5 files are organized like a filesystem, where arbitrarily-nested groups (i.e. directories) are used to organize datasets (i.e. files). The excellent HDFView program may be used to explore WEST data files.

The canonical file format reference for a given version of the WEST code is described in src/west/data_manager.py.

Overall structure

/
    #ibstates/
        index
        naming
            bstate_index
            bstate_pcoord
            istate_index
            istate_pcoord
    #tstates/
        index
    bin_topologies/
        index
        pickles
    iterations/
        iter_XXXXXXXX/\|iter_XXXXXXXX/
            auxdata/
            bin_target_counts
            ibstates/
                bstate_index
                bstate_pcoord
                istate_index
                istate_pcoord
            pcoord
            seg_index
            wtgraph
        ...
    summary

The root group (/)

The root of the WEST HDF5 file contains the following entries (where a trailing “/” denotes a group):

Name

Type

Description

ibstates/

Group

Initial and basis states for this simulation

tstates/

Group

Target (recycling) states for this simulation; may be empty

bin_topologies/

Group

Data pertaining to the binning scheme used in each iteration

iterations/

Group

Iteration data

summary

Dataset (1-dimensional, compound)

Summary data by iteration

The iteration summary table (/summary)

Field

Description

n_particles

the total number of walkers in this iteration

norm

total probability, for stability monitoring

min_bin_prob

smallest probability contained in a bin

max_bin_prob

largest probability contained in a bin

min_seg_prob

smallest probability carried by a walker

max_seg_prob

largest probability carried by a walker

cputime

total CPU time (in seconds) spent on propagation for this iteration

walltime

total wallclock time (in seconds) spent on this iteration

binhash

a hex string identifying the binning used in this iteration

Per iteration data (/iterations/iter_XXXXXXXX)

Data for each iteration is stored in its own group, named according to the iteration number and zero-padded out to 8 digits, as in /iterations/iter_00000001 for iteration 1. This is done solely for convenience in dealing with the data in external utilities that sort output by group name lexicographically. The field width is in fact configurable via the iter_prec configuration entry under data section of the WESTPA configuration file.

The HDF5 group for each iteration contains the following elements:

Name

Type

Description

auxdata/

Group

All user-defined auxiliary data0 sets

bin_target_counts

Dataset (1-dimensional)

The per-bin target count for the iteration

ibstates/

Group

Initial and basis state data for the iteration

pcoord

Dataset (3-dimensional)

Progress coordinate data for the iteration stored as a (num of segments, pcoord_len, pcoord_ndim) array

seg_index

Dataset (1-dimensional, compound)

Summary data for each segment

wtgraph

Dataset (1-dimensional)

The segment summary table (/iterations/iter_XXXXXXXX/seg_index)

Field

Description

weight

Segment weight

parent_id

Index of parent

wtg_n_parents

wtg_offset

cputime

Total cpu time required to run the segment

walltime

Total walltime required to run the segment

endpoint_type

status

Bin Topologies group (/bin_topologies)

Bin topologies used during a WE simulation are stored as a unique hash identifier and a serialized BinMapper object in python pickle format. This group contains two datasets:

  • index: Compound array containing the bin hash and pickle length

  • pickle: The pickled BinMapper objects for each unique mapper stored in a (num unique mappers, max pickled size) array