Setup

Defining and Calculating Progress Coordinates

Binning

The Weighted Ensemble method enhances sampling by partitioning the space defined by the progress coordinates into non-overlapping bins. WESTPA provides a number of pre-defined types of bins that the user must parameterize within the system.py file, which are detailed below.

Users are also free to implement their own mappers. A bin mapper must implement, at least, an assign(coords, mask=None, output=None) method, which is responsible for mapping each of the vector of coordinate tuples coords to an integer (numpy.uint16) indicating what bin that coordinate tuple falls into. The optional mask (a numpy bool array) specifies that some coordinates are to be skipped; this is used, for instance, by the recursive (nested) bin mapper to minimize the number of calculations required to definitively assign a coordinate tuple to a bin. Similarly, the optional output must be an integer (uint16) array of the same length as coords, into which assignments are written. The assign() function must return a reference to output. (This is used to avoid allocating many temporary output arrays in complex binning scenarios.)

A user-defined bin mapper must also make an nbins property available, containing the total number of bins within the mapper.

RectilinearBinMapper

Creates an N-dimensional grid of bins. The Rectilinear bin mapper is initialized by defining a set of bin boundaries:

self.bin_mapper = RectilinearBinMapper(boundaries)

where boundaries is a list or other iterable containing the bin boundaries along each dimension. The bin boundaries must be monotonically increasing along each dimension. It is important to note that a one-dimensional bin space must still be represented as a list of lists as in the following example::

bounds = [-float('inf'), 0.0, 1.0, 2.0, 3.0, float('inf')]
self.bin_mapper = RectilinearBinMapper([bounds])

A two-dimensional system might look like::

boundaries = [(-1,-0.5,0,0.5,1), (-1,-0.5,0,0.5,1)]
self.bin_mapper = RectilinearBinMapper(boundaries)

where the first tuple in the list defines the boundaries along the first progress coordinate, and the second tuple defines the boundaries along the second. Of course a list of arbitrary dimensions can be defined to create an N-dimensional grid discretizing the progress coordinate space.

VoronoiBinMapper

A one-dimensional mapper which assigns a multidimensional progress coordinate to the closest center based on a distance metric. The Voronoi bin mapper is initialized with the following signature within the WESTSystem.initialize::

self.bin_mapper = VoronoiBinMapper(dfunc, centers, dfargs=None, dfkwargs=None)

centers is a (n_centers, pcoord_ndim) shaped numpy array defining the generators of the Voronoi cells
dfunc is a method written in Python that returns an (n_centers, ) shaped array containing the distance between a single set of progress coordinates for a segment and all of the centers defining the Voronoi tessellation. It takes the general form::
```
def dfunc(p, centers, *dfargs, **dfkwargs):
    ...
    return d
```

where p is the progress coordinates of a single segment at one time slice of shape (pcoord_ndim,), centers is the full set of centers, dfargs is a tuple or list of positional arguments and dfwargs is a dictionary of keyword arguments. The bin mapper’s assign method then assigns the progress coordinates to the closest bin (minimum distance). It is the responsibility of the user to ensure that the distance is calculated using the appropriate metric.

dfargs is an optional list or tuple of positional arguments to pass into dfunc.
dfkwargs is an optional dict of keyword arguments to pass into dfunc.

FuncBinMapper

A bin mapper that employs a set of user-defined function, which directly calculate bin assignments for a number of coordinate values. The function is responsible for iterating over the entire coordinate set. This is best used with C/Cython/Numba methods, or intellegently-tuned numpy-based Python functions.

The FuncBinMapper is initialized as::

self.bin_mapper = FuncBinMapper(func, nbins, args=None, kwargs=None)

where func is the user-defined method to assign coordinates to bins, nbins is the number of bins in the partitioning space, and args and kwargs are optional positional and keyword arguments, respectively, that are passed into func when it is called.

The user-defined function should have the following form::

def func(coords, mask, output, *args, **kwargs)
    ....

where the assignments returned in the output array, which is modified in-place.

As a contrived example, the following function would assign all segments to bin 0 if the sum of the first two progress coordinates was less than s*0.5, and to bin 1 otherwise, where s=1.5::

def func(coords, mask, output, s):
    output[coords[:,0] + coords[:,1] < s*0.5] = 0
    output[coords[:,0] + coords[:,1] >= s*0.5] = 1

....

self.bin_mapper = FuncBinMapper(func, 2, args=(1.5,))

VectorizingFuncBinMapper

Like the FuncBinMapper, the VectorizingFuncBinMapper uses a user-defined method to calculate bin assignments. They differ, however, in that while the user-defined method passed to an instance of the FuncBinMapper is responsible for iterating over all coordinate sets passed to it, the function associated with the VectorizingFuncBinMapper is evaluated once for each unmasked coordinate tuple provided. It is not responsible explicitly for iterating over multiple progress coordinate sets.

The VectorizingFuncBinMapper is initialized as::

self.bin_mapper = VectorizingFuncBinMapper(func, nbins, args=None, kwargs=None)

where func is the user-defined method to assign coordinates to bins, nbins is the number of bins in the partitioning space, and args and kwargs are optional positional and keyword arguments, respectively, that are passed into func when it is called.

The user-defined function should have the following form::

def func(coords, *args, **kwargs)
    ....

Mirroring the simple example shown for the FuncBinMapper, the following should result in the same result for a given set of coordinates. Here segments would be assigned to bin 0 if the sum of the first two progress coordinates was less than s*0.5, and to bin 1 otherwise, where s=1.5::

def func(coords, s):
    if coords[0] + coords[1] < s*0.5:
        return 0
    else:
        return 1
....

self.bin_mapper = VectorizingFuncBinMapper(func, 2, args=(1.5,))

PiecewiseBinMapper

RecursiveBinMapper

The RecursiveBinMapper is used for assembling more complex bin spaces from simpler components and nesting one set of bins within another. It is initialized as::

self.bin_mapper = RecursiveBinMapper(base_mapper, start_index=0)

The base_mapper is an instance of one of the other bin mappers, and start_index is an (optional) offset for indexing the bins. Starting with the base_mapper, additional bins can be nested into it using the add_mapper(mapper, replaces_bin_at). This method will replace the bin containing the coordinate tuple replaces_bin_at with the mapper specified by mapper.

As a simple example consider a bin space in which the base_mapper assigns a segment with progress coordinate with values <1 into one bin and >= 1 into another. Within the former bin, we will nest a second mapper which partitions progress coordinate space into one bin for progress coordinate values <0.5 and another for progress coordinates with values >=0.5. The bin space would look like the following with corresponding code::

'''
             0                            1                      2
             +----------------------------+----------------------+
             |            0.5             |                      |
             | +-----------+------------+ |                      |
             | |           |            | |                      |
             | |     1     |     2      | |          0           |
             | |           |            | |                      |
             | |           |            | |                      |
             | +-----------+------------+ |                      |prettyprint
             +---------------------------------------------------+
'''

def fn1(coords, mask, output):
    test = coords[:,0] < 1
    output[mask & test] = 0
    output[mask & ~test] = 1

def fn2(coords, mask, output):
    test = coords[:,0] < 0.5
    output[mask & test] = 0
    output[mask & ~test] = 1

outer_mapper = FuncBinMapper(fn1,2)
inner_mapper = FuncBinMapper(fn2,2)
rmapper = RecursiveBinMapper(outer_mapper)
rmapper.add_mapper(inner_mapper, [0.5])

Examples of more complicated nesting schemes can be found in the tests for the WESTPA binning apparatus.

Initial/Basis States

A WESTPA simulation is initialized using w_init with an initial distribution of replicas generated from a set of basis states. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation or due to recycling. Basis states are specified when running w_init either in a file specified with --bstates-from, or by one or more --bstate arguments. If neither --bstates-from nor at least one --bstate argument is provided, then a default basis state of probability one identified by the state ID zero and label “basis” will be created (a warning will be printed in this case, to remind you of this behavior, in case it is not what you wanted).

When using a file passed to w_init using --bstates-from, each line in that file defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in::

unbound    1.0

or:

unbound_0    0.6        state0.pdb
unbound_1    0.4        state1.pdb

Basis states can also be supplied at the command line using one or more --bstate flags, where the argument matches the format used in the state file above. The total probability summed over all basis states should equal unity, however WESTPA will renormalize the distribution if this condition is not met.

Initial states are the generated from the basis states by optionally applying some perturbation or modification to the basis state. For example if WESTPA was being used to simulate ligand binding, one might want to have a basis state where the ligand was some set distance from the binding partner, and initial states are generated by randomly orienting the ligand at that distance. When using the executable propagator, this is done using the script specified under the gen_istate section of the executable configuration. Otherwise, if defining a custom propagator, the user must override the gen_istate method of WESTPropagator.

When using the executable propagator, the the script specified by gen_istate should take the data supplied by the environmental variable $WEST_BSTATE_DATA_REF and return the generated initial state to $WEST_ISTATE_DATA_REF. If no transform need be performed, the user may simply copy the data directly without modification. This data will then be available via $WEST_PARENT_DATA_REF if $WEST_CURRENT_SEG_INITPOINT_TYPE is SEG_INITPOINT_NEWTRAJ.

Target States

WESTPA can be run in a recycling mode in which replicas reaching a target state are removed from the simulation and their weights are assigned to new replicas created from one of the initial states. This mode creates a non-equilibrium steady-state that isolates members of the trajectory ensemble originating in the set of initial states and transitioning to the target states. The flux of probability into the target state is then inversely proportional to the mean first passage time (MFPT) of the transition.

Target states are defined when initializing a WESTPA simulation when calling w_init. Target states are specified either in a file specified with --tstates-from, or by one or more --tstate arguments. If neither --tstates-from nor at least one --tstate argument is provided, then an equilibrium simulation (without any sinks) will be performed.

Target states can be defined using a text file, where each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in::

bound     0.02

for a single target and one-dimensional progress coordinates or::

bound    2.7    0.0
drift    100    50.0

for two targets and a two-dimensional progress coordinate.

The argument associated with --tstate is a string of the form 'label, pcoord0 [,pcoord1[,...]]', similar to a line in the example target state definition file above. This argument may be specified more than once, in which case the given states are appended to the list of target states for the simulation in the order they appear on the command line, after those that are specified by --tstates-from, if any.

WESTPA uses the representative progress coordinate of a target-state and converts the entire bin containing that progress coordinate into a recycling sink.

Propagators

The Executable Propagator

Writing custom propagators

While most users will use the Executable propagator to run dynamics by calling out to an external piece of software, it is possible to write custom propagators that can be used to generate sampling directly through the python interface. This is particularly useful when simulating simple systems, where the overhead of starting up an external program is large compared to the actual cost of computing the trajectory segment. Other use cases might include running sampling with software that has a Python API (e.g. OpenMM).

In order to create a custom propagator, users must define a class that inherits from WESTPropagator and implement three methods:

get_pcoord(self, state): Get the progress coordinate of the given basis or initial state.
gen_istate(self, basis_state, initial_state): Generate a new initial state from the given basis state. This method is optional if gen_istates is set to False in the propagation section of the configuration file, which is the default setting.
propagate(self, segments): Propagate one or more segments, including any necessary per-iteration setup and teardown for this propagator.

There are also two stubs that that, if overridden, provide a mechanism for modifying the simulation before or after the iteration:

prepare_iteration(self, n_iter, segments): Perform any necessary per-iteration preparation. This is run by the work manager.
finalize_iteration(self, n_iter, segments): Perform any necessary post-iteration cleanup. This is run by the work manager.

Several examples of custom propagators are available:

Configuration File

The configuration of a WESTPA simulation is specified using a plain text file written in YAML. This file specifies, among many other things, the length of the simulation, which modules should be loaded for specifying the system, how external data should be organized on the file system, and which plugins should used. YAML is a hierarchical format and WESTPA organizes the configuration settings into blocks for each component. While below, the configuration file will be referred to as west.cfg, the user is free to name the configuration file something else. Most of the scripts and tools that WESTPA provides, however, require that the name of the configuration file be specified if the default name is not used.

The top most heading in west.cfg should be specified as::

---
west:
    ...

with all sub-section specified below it. A complete example can be found for the NaCl example: https://github.com/westpa/westpa/blob/master/lib/examples/nacl_gmx/west.cfg

In the following section, the specifications for each section of the file can be found, along with default parameters and descriptions. Required parameters are indicated as REQUIRED.:

---
west:
    ...
    system:
        driver: REQUIRED
        module_path: []

The driver parameter must be set to a subclass of WESTSystem, and given in the form module.class. The module_path parameter is appended to the system path and indicates where the class is defined.:

---
west:
    ...
    we:
        adjust_counts: True
        weight_split_threshold: 2.0
        weight_merge_cutoff: 1.0

The we section section specifies parameters related to the Huber and Kim resampling algorithm. WESTPA implements a variation of the method, in which setting adust_counts to True strictly enforces that the number of replicas per bin is exactly system.bin_target_counts. Otherwise, the number of replicas per is allowed to fluctuate as in the original implementation of the algorithm. Adjusting the counts can improve load balancing for parallel simulations. Replicas with weights greater than weight_split_threshold times the ideal weight per bin are tagged as candidates for splitting. Replicas with weights less than weight_merge_cutoff times the ideal weight per bin are candidates for merging.:

---
west:
    ...
    propagation:
        gen_istates: False
        block_size: 1
        save_transition_matrices: False
        max_run_wallclock: None
        max_total_iterations: None

gen_istates: Boolean specifying whether to generate initial states from the basis states. The executable propagator defines a specific configuration block (add internal link to other section), and custom propagators should override the WESTPropagator.gen_istate() method.
block_size: An integer defining how many segments should be passed to a worker at a time. When using the serial work manager, this value should be set to the maximum number of segments per iteration to avoid significant overhead incurred by the locking mechanism in the WMFutures framework. Parallel work managers might benefit from setting this value greater than one in some instances to decrease network communication load.
save_transition_matrices:
max_run_wallclock: A time in dd:hh:mm:ss or hh:mm:ss specifying the maximum wallclock time of a particular WESTPA run. If running on a batch queuing system, this time should be set to less than the job allocation time to ensure that WESTPA shuts down cleanly.

max_total_iterations: An integer value specifying the number of iterations to run. This parameter is checked against the last completed iteration stored in the HDF5 file, not the number of iterations completed for a specific run. The default value of None only stops upon external termination of the code.:

---
west:
    ...
    data:
        west_data_file: REQUIRED
        aux_compression_threshold: 1048576
        iter_prec: 8
        datasets:
            -name: REQUIRED
             h5path:
             store: True
             load: False
             dtype:
             scaleoffset: None
             compression: None
             chunks: None
        data_refs:
            segment:
            basis_state:
            initial_state:

west_data_file: The name of the main HDF5 data storage file for the WESTPA simulation.
aux_compression_threshold: The threshold in bytes for compressing the auxiliary data in a dataset on an iteration-by-iteration basis.
iter_prec: The length of the iteration index with zero-padding. For the default value, iteration 1 would be specified as iter_00000001.
datasets:
data_refs:
plugins
executable

Environmental Variables

There are a number of environmental variables that can be set by the user in order to configure a WESTPA simulation:

WEST_ROOT: path to the base directory containing the WESTPA install
WEST_SIM_ROOT: path to the base directory of the WESTPA simulation
WEST_PYTHON: path to python executable to run the WESTPA simulation
WEST_PYTHONPATH: path to any additional modules that WESTPA will require to run the simulation
WEST_KERNPROF: path to kernprof.py script to perform line-by-line profiling of a WESTPA simulation (see python line_profiler). This is only required for users who need to profile specific methods in a running WESTPA simulation.

Work manager related environmental variables:

WM_WORK_MANAGER
WM_N_WORKERS

WESTPA makes available to any script executed by it (e.g. runseg.sh), a number of environmental variables that are set dynamically by the executable propagator from the running simulation.

Programs executed for an iteration

The following environment variables are passed to programs executed on a per-iteration basis, notably pre-iteration and post-iteration scripts.

Variable	Possible values	Function
WEST_CURRENT_ITER	Integer >=1	Current iteration number

Programs executed for a segment

The following environment variables are passed to programs executed on a per-segment basis, notably dynamics propagation.

Variable	Possible values	Function
WEST_CURRENT_ITER	Integer >=1	Current iteration number
WEST_CURRENT_SEG_ID	Integer >=0	Current segment ID
WEST_CURRENT_SEG_DATA_REF	String	General-purpose reference, based on current segment information, configured in west.cfg. Usually used for storage paths
WEST_CURRENT_SEG_INITPOINT_TYPE	Enumeration: SEG_INITPOINT_CONTINUES, SEG_INITPOINT_NEWTRAJ	Whether this segment continues a previous trajectory or initiates a new one.
WEST_PARENT_ID	Integer	Segment ID of parent segment. Negative for initial points.
WEST_PARENT_DATA_REF	String	General purpose reference, based on parent segment information, configured in west.cfg. Usually used for storage paths
WEST_PCOORD_RETURN	Filename	Where progress coordinate data must be stored
WEST_RAND16	Integer	16-bit random integer
WEST_RAND32	Integer	32-bit random integer
WEST_RAND64	Integer	64-bit random integer
WEST_RAND128	Integer	128-bit random integer
WEST_RANDFLOAT	Floating-point	Random number in [0,1).

Additionally for any additional datasets specified in the configuration file, WESTPA automatically provides WEST_X_RETURN, where X is the uppercase name of the dataset. For example if the configuration file contains the following:

data:
    ...
    datasets: # dataset storage options
      - name: energy

WESTPA would make WEST_ENERGY_RETURN available.

Programs executed for a single point

Programs used for creating initial states from basis states (gen_istate.sh) or extracting progress coordinates from structures (e.g. get_pcoord.sh) are provided the following environment variables:

Variable	Available for	Possible values	Function
WEST_STRUCT_DATA_REF	All single-point calculations	String	General-purpose reference, usually a pathname, associated with the basis/initial state.
WEST_BSTATE_ID	get_pcoord for basis state, gen_istate	Integer >= 0	Basis state ID
WEST_BSTATE_DATA_REF	get_pcoord for basis state, gen_istate	String	Basis state data reference
WEST_ISTATE_ID	get_pcoord for initial state, gen_istate	Integer >= 0	Inital state ID
WEST_ISTATE_DATA_REF	get_pcoord for initial state, gen_istate	String	Initial state data references, usually a pathname
WEST_PCOORD_RETURN	get_pcoord for basis or initial state	Pathname	Where progress coordinate data is expected to be found after execution

Plugins

WESTPA has a extensible plugin architecture that allows the user to manipulate the simulation at specified points during an iteration.

Activating plugins in the config file
Plugin execution order/priority