westpa.westext.hamsm_restarting package

Description

This plugin leverages haMSM analysis [1] to provide simulation post-analysis. This post-analysis can be used on its own, or can be used to initialize and run new WESTPA simulations using structures in the haMSM’s best estimate of steady-state as described in [2], which may accelerate convergence to steady-state.

haMSM analysis is performed using the msm_we library.

Sample files necessary to run the restarting plugin (as described below) can be found in the WESTPA GitHub Repo.

Usage

Configuration

`west.cfg`

This plugin requires the following section in west.cfg (or whatever your WE configuration file is named):

west:
  plugins:
  - plugin: westpa.westext.hamsm_restarting.restart_driver.RestartDriver
    n_restarts: 0             # Number of restarts to perform
    n_runs: 5                 # Number of runs within each restart
    n_restarts_to_use: 0.5    # Amount of prior restarts' data to use. -1, a decimal in (0,1), or an integer. Details below.
    extension_iters: 5        # Number of iterations to continue runs for, if target is not reached by first restart period
    coord_len: 2                                      # Length of pcoords returned
    initialization_file: restart_initialization.json  # JSON describing w_run parameters for new runs
    ref_pdb_file: common_files/bstate.pdb             # File containing reference structure/topology
    model_name: NaClFlux                              # Name for msm_we model
    n_clusters: 2                                     # Number of clusters in haMSM building
    we_folder: .                                      # Should point to the same directory as WEST_SIM_ROOT
    target_pcoord_bounds: [[-inf, 2.60]]              # Progress coordinate boundaries for the target state
    basis_pcoord_bounds: [[12.0, inf]]               # Progress coordinate boundaries for the basis state
    tau: 5e-13                                        # Resampling time, i.e. length of a WE iteration in physical units
    pcoord_ndim0: 1                                   # Dimensionality of progress coordinate
    dim_reduce_method: pca                            # Dimensionality reduction scheme, either "pca", "vamp", or "none"
    parent_traj_filename: parent.xml                  # Name of parent file in each segment
    child_traj_filename: seg.xml                      # Name of child file in each segment
    user_functions: westpa_scripts/restart_overrides.py       # Python file defining coordinate processing
    struct_filetype: mdtraj.formats.PDBTrajectoryFile         # Filetype for output start-structures
    debug: False              # Optional, defaults to False. If true, enables debug-mode logging.
    streaming: True           # Does clustering in a streaming fashion, versus trying to load all coords in memory
    n_cpus: 1                 # Number of CPUs to use for parallel calculations

Some sample parameters are provided in the above, but of course should be modified to your specific system.

Note about restarts_to_use : restarts_to_use can be specified in a few different ways. A value of -1 means to use all available data. A decimal 0 < restarts_to_use < 1 will use the last restarts_to_use * current_restart iterations of data – so, for example, set to 0.5 to use the last half of the data, or 0.75 to use the last 3/4. Finally, and integer value will just use the last restarts_to_use iterations.

Note that ref_pdb_file can be any filetype supported by msm_we.initialize()’s structure loading. At the time of writing, this is limited to PDB, however that is planned to be extended. Also at the time of writing, that’s only used to set model.nAtoms, so if you’re using some weird topology that’s unsupported, you should be able to scrap that and manually set nAtoms on the object.

Also in this file, west.data.data_refs.basis_state MUST point to $WEST_SIM_ROOT/{basis_state.auxref} and not a subdirectory if restarts are being used. This is because when the plugin initiates a restart, start_state references in $WEST_SIM_ROOT/restartXX/start_states.txt are set relative to $WEST_SIM_ROOT. All basis/start state references are defined relative to west.data.data_refs.basis_state, so if that points to a subdirectory of $WEST_SIM_ROOT, those paths will not be accurate.

Running

Once configured, just run your WESTPA simulation normally with w_run, and the plugin will automatically handle performing restarts, and extensions if necessary.

Extensions

To be clear: these are extensions in the sense of extending a simulation to be longer – not in the sense of “an extension to the WESTPA software package”!

Running with extension_iters greater than 0 will enable extensions before the first restart if the target state is not reached. This is useful to avoid restarting when you don’t yet have structures spanning all the way from your basis to target. At the time of writing, it’s not yet clear whether restarting from “incomplete” WE runs like this will help or hinder the total number of iterations it takes to reach the target.

Extensions are simple and work as follows: before doing the first restart, after all runs are complete, the output WESTPA h5 files are scanned to see if any recycling has occurred. If it hasn’t, then each run is extended by extension_iters iterations.

`restart_initialization.json`

{
    "bstates":["start,1,bstates/bstate.pdb"],
    "tstates":["bound,2.6"],
    "bstate-file":"bstates/bstates.txt",
    "tstate-file" :"tstate.file",
    "segs-per-state": 1
}

It is not necessary to specify both in-line states and a state-file for each, but that is shown in the sample for completeness.

It is important that bstates and tstates are lists of strings, and not just strings, even if only one bstate/tstate is being used!

With n_runs > 1, before doing any restart, multiple independent runs are performed. However, before the first restart (this applies if no restarts are performed as well), the plugin has no way of accessing the parameters that were initially passed to w_init and w_run.

Therefore, it is necessary to store those parameters in a file, so the plugin can read them and initiate subsequent runs.

After the first restart is performed, the plugin writes this file itself, so it is only necessary to manually configure for that first set of runs.

Featurization overrides

import numpy as np
import mdtraj as md

def processCoordinates(self, coords):
        log.debug("Processing coordinates")

        if self.dimReduceMethod == "none":
            nC = np.shape(coords)
            nC = nC[0]
            ndim = 3 * self.nAtoms
            data = coords.reshape(nC, 3 * self.nAtoms)
            return data

        if self.dimReduceMethod == "pca" or self.dimReduceMethod == "vamp":

            ### NaCl RMSD dimensionality reduction
            log.warning("Hardcoded selection: Doing dim reduction for Na, Cl. This is only for testing!")
            indNA = self.reference_structure.topology.select("element Na")
            indCL = self.reference_structure.topology.select("element Cl")

            diff = np.subtract(coords[:, indNA], coords[:, indCL])

            dist = np.array(np.sqrt(
                np.mean(
                    np.power(
                        diff,
                        2)
                , axis=-1)
            ))

            return dist

This is the file whose path is provided in the configuration file in plugin.user_functions, and must be a Python file defining a function named processCoordinates(self, coords) which takes a numpy array of coordinates, featurizes it, and returns the numpy array of feature-coordinates.

This is left to be user-provided because whatever featurization you do will be system-specific. The provided function is monkey-patched into the msm_we.modelWE class.

An example is provided above, which does a simple RMSD coordinate reduction for the NaCl association tutorial system.

Doing only post-analysis

If you want to ONLY use this for haMSM post-analysis, and not restarting, just set n_restarts: 0 in the configuration.

Work manager for restarting

If you’re using some parallelism (which you should), and you’re using the plugin to do restarts or multiple runs, then your choice of work manager can be important. This plugin handles starting new WESTPA runs using the Python API. The process work manager, by default, uses fork to start new workers which seems to eventually causes memory issues, since fork passes the entire contents of the parent to each child. Switching the spawn method to forkserver or spawn may introduce other issues.

Using the ZMQ work manager works well. The MPI work manager should also work well, though is untested. Both of these handle starting new workers in a more efficient way, without copying the full state of the parent.

Continuing a failed run

The restarting plugin has a few different things it expects to find when it runs. Crashes during the WE run should not affect this. However, if the plugin itself crashes while running, these may be left in a weird state.

If the plugin crashes while running, make sure:

restart.dat contains the correct entries. restarts_completed is the number of restarts successfully completed, and same for runs_completed within that restart.
restart_initialization.json is pointing to the correct restart

It may help to w_truncate the very last iteration and allow WESTPA to re-do it.

Potential Pitfalls/Troubleshooting

Basis state calculation may take a LONG time with a large number of start-states. A simple RMSD calculation using cpptraj and 500,000 start-states took over 6 hours. Reducing the number of runs used through n_restarts_to_use will ameliorate this.
If restart_driver.prepare_coordinates() has written a coordinate for an iteration, subsequent runs will NOT overwrite it, and will skip it.
In general: verify that msm_we is installed
Verify that restart_initialization.json has been correctly set
This plugin does not yet attempt to resolve environment variables in the config, so things like say, $WEST_SIM_ROOT, will be interpreted literally in paths

References

[1] Suárez, E., Adelman, J. L. & Zuckerman, D. M. Accurate Estimation of Protein Folding and Unfolding Times: Beyond Markov State Models. J Chem Theory Comput 12, 3473–3481 (2016).

[2] Copperman, J. & Zuckerman, D. M. Accelerated Estimation of Long-Timescale Kinetics from Weighted Ensemble Simulation via Non-Markovian “Microbin” Analysis. J Chem Theory Comput 16, 6763–6775 (2020).