HDF5 File Schema
WESTPA stores all of its simulation data in the crossplatform, selfdescribing HDF5 file format. This file format can be read and written by a variety of languages and toolkits, including C/C++, Fortran, Python, Java, and Matlab so that analysis of weighted ensemble simulations is not tied to using the WESTPA framework. HDF5 files are organized like a filesystem, where arbitrarilynested groups (i.e. directories) are used to organize datasets (i.e. files). The excellent HDFView program may be used to explore WEST data files.
The canonical file format reference for a given version of the WEST code is described in src/west/data_manager.py.
Overall structure
/
#ibstates/
index
naming
bstate_index
bstate_pcoord
istate_index
istate_pcoord
#tstates/
index
bin_topologies/
index
pickles
iterations/
iter_XXXXXXXX/\iter_XXXXXXXX/
auxdata/
bin_target_counts
ibstates/
bstate_index
bstate_pcoord
istate_index
istate_pcoord
pcoord
seg_index
wtgraph
...
summary
The root group (/)
The root of the WEST HDF5 file contains the following entries (where a trailing “/” denotes a group):
Name 
Type 
Description 

ibstates/ 
Group 
Initial and basis states for this simulation 
tstates/ 
Group 
Target (recycling) states for this simulation; may be empty 
bin_topologies/ 
Group 
Data pertaining to the binning scheme used in each iteration 
iterations/ 
Group 
Iteration data 
summary 
Dataset (1dimensional, compound) 
Summary data by iteration 
The iteration summary table (/summary)
Field 
Description 

n_particles 
the total number of walkers in this iteration 
norm 
total probability, for stability monitoring 
min_bin_prob 
smallest probability contained in a bin 
max_bin_prob 
largest probability contained in a bin 
min_seg_prob 
smallest probability carried by a walker 
max_seg_prob 
largest probability carried by a walker 
cputime 
total CPU time (in seconds) spent on propagation for this iteration 
walltime 
total wallclock time (in seconds) spent on this iteration 
binhash 
a hex string identifying the binning used in this iteration 
Per iteration data (/iterations/iter_XXXXXXXX)
Data for each iteration is stored in its own group, named according to the
iteration number and zeropadded out to 8 digits, as in
/iterations/iter_00000001
for iteration 1. This is done solely for
convenience in dealing with the data in external utilities that sort output by
group name lexicographically. The field width is in fact configurable via the
iter_prec
configuration entry under data
section of the WESTPA
configuration file.
The HDF5 group for each iteration contains the following elements:
Name 
Type 
Description 

auxdata/ 
Group 
All userdefined auxiliary data0 sets 
bin_target_counts 
Dataset (1dimensional) 
The perbin target count for the iteration 
ibstates/ 
Group 
Initial and basis state data for the iteration 
pcoord 
Dataset (3dimensional) 
Progress coordinate data for the iteration stored as a (num of segments, pcoord_len, pcoord_ndim) array 
seg_index 
Dataset (1dimensional, compound) 
Summary data for each segment 
wtgraph 
Dataset (1dimensional) 
The segment summary table (/iterations/iter_XXXXXXXX/seg_index)
Field 
Description 

weight 
Segment weight 
parent_id 
Index of parent 
wtg_n_parents 

wtg_offset 

cputime 
Total cpu time required to run the segment 
walltime 
Total walltime required to run the segment 
endpoint_type 

status 
Bin Topologies group (/bin_topologies)
Bin topologies used during a WE simulation are stored as a unique hash
identifier and a serialized BinMapper
object in python pickle format. This group contains
two datasets:
index
: Compound array containing the bin hash and pickle lengthpickle
: The pickledBinMapper
objects for each unique mapper stored in a (num unique mappers, max pickled size) array