WESTPA 2.0
Overview
WESTPA is a package for constructing and running stochastic simulations using the “weighted ensemble” approach of Huber and Kim (1996). For use of WESTPA please cite the following:
Zwier, M.C., Adelman, J.L., Kaus, J.W., Pratt, A.J., Wong, K.F., Rego, N.B., Suarez, E., Lettieri, S., Wang, D.W., Grabe, M., Zuckerman, D.M., and Chong, L.T. “WESTPA: An Interoperable, Highly Scalable Software Package For Weighted Ensemble Simulation and Analysis,” J. Chem. Theory Comput., 11: 800−809 (2015).
Russo, J. D., Zhang, S., Leung, J.M.G., Bogetti, A.T., Thompson, J.P., DeGrave, A.J., Torrillo, P.A., Pratt, A.J., Wong, K.F., Xia, J., Copperman, J., Adelman, J.L., Zwier, M.C., LeBard, D.N., Zuckerman, D.M., Chong, L.T. WESTPA 2.0: High-Performance Upgrades for Weighted Ensemble Simulations and Analysis of Longer-Timescale Applications. J. Chem. Theory Comput., 18 (2): 638–649 (2022).
See this page and this powerpoint for an overview of weighted ensemble simulation.
To help us fund development and improve WESTPA please fill out a one-minute survey and consider contributing documentation or code to the WESTPA community.
WESTPA is free software, licensed under the terms of the MIT License. See the file LICENSE
for more information.
Requirements
WESTPA is written in Python and requires version 3.7 or later. WESTPA also requires a number of Python scientific software packages. The simplest way to meet these requirements is to download the Anaconda Python distribution from www.anaconda.com (free for all users).
WESTPA currently runs on Unix-like operating systems, including Linux and Mac OS X. It is developed and tested on x86_64 machines running Linux.
Obtaining and Installing WESTPA
WESTPA is developed and tested on Unix-like operating systems, including Linux and Mac OS X.
Regardless of the chosen method of installation, before installing WESTPA, we recommend you to first install the Python 3 version provided by the latest free Anaconda Python distribution. After installing Anaconda, create a new python environment for the WESTPA install with the following:
conda create -n westpa-2.0 python=3.9
conda activate westpa-2.0
Then, we recommend installing WESTPA through conda or pip. Execute either of the following:
conda install -c conda-forge westpa
or:
python -m pip install westpa
See the install instructions on our wiki for more detailed information.
To install from source (not recommended), start by downloading the corresponding tar.gz file from the releases page. After downloading the file, unpack the file and install WESTPA by executing the following:
tar xvzf westpa-main.tar.gz
cd westpa
python -m pip install -e .
Getting started
High-level tutorials of how to use the WESTPA software can be found here. Further, all WESTPA command-line tools provide detailed help when given the -h/–help option.
Finally, while WESTPA is a powerful tool that enables expert simulators to access much longer timescales than is practical with standard simulations, there can be a steep learning curve to figuring out how to effectively run the simulations on your computing resource of choice. For serious users who have completed the online tutorials and are ready for production simulations of their system, we invite you to contact Lillian Chong (ltchong AT pitt DOT edu) about spending a few days with her lab and/or setting up video conferencing sessions to help you get your simulations off the ground.
Getting help
WESTPA FAQ
A mailing list for WESTPA is available, at which one can ask questions (or see if a question one has was previously addressed). This is the preferred means for obtaining help and support. See http://groups.google.com/group/westpa-users to sign up or search archived messages.
Developers
Search archived messages or post to the westpa-devel Google group: https://groups.google.com/group/westpa-devel.
westpa.cli package
w_init
w_init
initializes the weighted ensemble simulation, creates the
main HDF5 file and prepares the first iteration.
Overview
Usage:
w_init [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--force] [--bstate-file BSTATE_FILE] [--bstate BSTATES]
[--tstate-file TSTATE_FILE] [--tstate TSTATES]
[--segs-per-state N] [--no-we] [--wm-work-manager WORK_MANAGER]
[--wm-n-workers N_WORKERS] [--wm-zmq-mode MODE]
[--wm-zmq-info INFO_FILE] [--wm-zmq-task-endpoint TASK_ENDPOINT]
[--wm-zmq-result-endpoint RESULT_ENDPOINT]
[--wm-zmq-announce-endpoint ANNOUNCE_ENDPOINT]
[--wm-zmq-heartbeat-interval INTERVAL]
[--wm-zmq-task-timeout TIMEOUT] [--wm-zmq-client-comm-mode MODE]
Initialize a new WEST simulation, creating the WEST HDF5 file and preparing the
first iteration’s segments. Initial states are generated from one or more
“basis states” which are specified either in a file specified with
--bstates-from
, or by one or more --bstate
arguments. If neither
--bstates-from
nor at least one --bstate
argument is provided, then a
default basis state of probability one identified by the state ID zero and
label “basis” will be created (a warning will be printed in this case, to
remind you of this behavior, in case it is not what you wanted). Target states
for (non- equilibrium) steady-state simulations are specified either in a file
specified with --tstates-from
, or by one or more --tstate
arguments. If
neither --tstates-from
nor at least one --tstate
argument is provided,
then an equilibrium simulation (without any sinks) will be performed.
Command-Line Options
See the general command-line tool reference for more information on the general options.
State Options
--force
Overwrites any existing simulation data
--bstate BSTATES
Add the given basis state (specified as a string
'label,probability[,auxref]') to the list of basis states (after
those specified in --bstates-from, if any). This argument may be
specified more than once, in which case the given states are
appended in the order they are given on the command line.
--bstate-file BSTATE_FILE, --bstates-from BSTATE_FILE
Read basis state names, probabilities, and (optionally) data
references from BSTATE_FILE.
--tstate TSTATES
Add the given target state (specified as a string
'label,pcoord0[,pcoord1[,...]]') to the list of target states (after
those specified in the file given by --tstates-from, if any). This
argument may be specified more than once, in which case the given
states are appended in the order they appear on the command line.
--tstate-file TSTATE_FILE, --tstates-from TSTATE_FILE
Read target state names and representative progress coordinates from
TSTATE_FILE. WESTPA uses the representative progress coordinate of a target state and
converts the **entire** bin containing that progress coordinate into a
recycling sink.
--segs-per-state N
Initialize N segments from each basis state (default: 1).
--no-we, --shotgun
Do not run the weighted ensemble bin/split/merge algorithm on
newly-created segments.
Examples
(TODO: write 3 examples; Setting up the basis states, explanation of bstates and istates. Setting up an equilibrium simulation, w/o target(s) for recycling. Setting up a simulation with one/multiple target states.)
westpa.cli.core.w_init module
- class westpa.cli.core.w_init.BasisState(label, probability, pcoord=None, auxref=None, state_id=None)
Bases:
object
Describes an basis (micro)state. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation (i.e. at w_init) or due to recycling.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
probability – Probability of this state to be selected when creating a new trajectory.
pcoord – The representative progress coordinate of this state.
auxref – A user-provided (string) reference for locating data associated with this state (usually a filesystem path).
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile)
Read a file defining basis states. Each line defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in:
unbound 1.0
or:
unbound_0 0.6 state0.pdb unbound_1 0.4 state1.pdb
- as_numpy_record()
Return the data for this state as a numpy record array.
- class westpa.cli.core.w_init.TargetState(label, pcoord, state_id=None)
Bases:
object
Describes a target state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
pcoord – The representative progress coordinate of this state.
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile, dtype)
Read a file defining target states. Each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in:
bound 0.02
for a single target and one-dimensional progress coordinates or:
bound 2.7 0.0 drift 100 50.0
for two targets and a two-dimensional progress coordinate.
- westpa.cli.core.w_init.make_work_manager()
Using cues from the environment, instantiate a pre-configured work manager.
- westpa.cli.core.w_init.entry_point()
- westpa.cli.core.w_init.initialize(tstates, tstate_file, bstates, bstate_file, sstates=None, sstate_file=None, segs_per_state=1, shotgun=False)
Initialize a WESTPA simulation.
tstates : list of str
tstate_file : str
bstates : list of str
bstate_file : str
sstates : list of str
sstate_file : str
segs_per_state : int
shotgun : bool
w_bins
w_bins
deals with binning modification and statistics
Overview
Usage:
w_bins [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE]
{info,rebin} ...
Display information and statistics about binning in a WEST simulation, or modify the binning for the current iteration of a WEST simulation.
Command-Line Options
See the general command-line tool reference for more information on the general options.
Options Under ‘info’
Usage:
w_bins info [-h] [-n N_ITER] [--detail]
[--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION | --bins-from-file]
Positional options:
info
Display information about binning.
Options for ‘info’:
-n N_ITER, --n-iter N_ITER
Consider initial points of segment N_ITER (default: current
iteration).
--detail
Display detailed per-bin information in addition to summary
information.
Binning options for ‘info’:
--bins-from-system
Bins are constructed by the system driver specified in the WEST
configuration file (default where stored bin definitions not
available).
--bins-from-expr BINS_FROM_EXPR, --binbounds BINS_FROM_EXPR
Construct bins on a rectilinear grid according to the given BINEXPR.
This must be a list of lists of bin boundaries (one list of bin
boundaries for each dimension of the progress coordinate), formatted
as a Python expression. E.g. "[[0,1,2,4,inf],[-inf,0,inf]]". The
numpy module and the special symbol "inf" (for floating-point
infinity) are available for use within BINEXPR.
--bins-from-function BINS_FROM_FUNCTION, --binfunc BINS_FROM_FUNCTION
Supply an external function which, when called, returns a properly
constructed bin mapper which will then be used for bin assignments.
This should be formatted as "[PATH:]MODULE.FUNC", where the function
FUNC in module MODULE will be used; the optional PATH will be
prepended to the module search path when loading MODULE.
--bins-from-file
Load bin specification from the data file being examined (default
where stored bin definitions available).
Options Under ‘rebin’
Usage:
w_bins rebin [-h] [--confirm] [--detail]
[--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION]
[--target-counts TARGET_COUNTS | --target-counts-from FILENAME]
Positional option:
rebin
Rebuild current iteration with new binning.
Options for ‘rebin’:
--confirm
Commit the revised iteration to HDF5; without this option, the
effects of the new binning are only calculated and printed.
--detail
Display detailed per-bin information in addition to summary
information.
Binning options for ‘rebin’;
Same as the binning options for ‘info’.
Bin target count options for ‘rebin’;:
--target-counts TARGET_COUNTS
Use TARGET_COUNTS instead of stored or system driver target counts.
TARGET_COUNTS is a comma-separated list of integers. As a special
case, a single integer is acceptable, in which case the same target
count is used for all bins.
--target-counts-from FILENAME
Read target counts from the text file FILENAME instead of using
stored or system driver target counts. FILENAME must contain a list
of integers, separated by arbitrary whitespace (including newlines).
Input Options
-W WEST_H5FILE, --west_data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file
specified in west.cfg).
Examples
(TODO: Write up an example)
westpa.cli.tools.w_bins module
- class westpa.cli.tools.w_bins.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- class westpa.cli.tools.w_bins.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_bins.BinMappingComponent
Bases:
WESTToolComponent
Component for obtaining a bin mapper from one of several places based on command-line arguments. Such locations include an HDF5 file that contains pickled mappers (including the primary WEST HDF5 file), the system object, an external function, or (in the common case of rectilinear bins) a list of lists of bin boundaries.
Some configuration is necessary prior to calling process_args() if loading a mapper from HDF5. Specifically, either set_we_h5file_info() or set_other_h5file_info() must be called to describe where to find the appropriate mapper. In the case of set_we_h5file_info(), the mapper used for WE at the end of a given iteration will be loaded. In the case of set_other_h5file_info(), an arbitrary group and hash value are specified; the mapper corresponding to that hash in the given group will be returned.
In the absence of arguments, the mapper contained in an existing HDF5 file is preferred; if that is not available, the mapper from the system driver is used.
This component adds the following arguments to argument parsers:
- --bins-from-system
Obtain bins from the system driver
—bins-from-expr=EXPR Construct rectilinear bins by parsing EXPR and calling RectilinearBinMapper() with the result. EXPR must therefore be a list of lists.
- –bins-from-function=[PATH:]MODULE.FUNC
Call an external function FUNC in module MODULE (optionally adding PATH to the search path when loading MODULE) which, when called, returns a fully-constructed bin mapper.
—bins-from-file Load bin definitions from a YAML configuration file.
- --bins-from-h5file
Load bins from the file being considered; this is intended to mean the master WEST HDF5 file or results of other binning calculations, as appropriate.
- add_args(parser, description='binning options', suppress=[])
Add arguments specific to this component to the given argparse parser.
- add_target_count_args(parser, description='bin target count options')
Add options to the given parser corresponding to target counts.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- set_we_h5file_info(n_iter=None, data_manager=None, required=False)
Set up to load a bin mapper from the master WEST HDF5 file. The mapper is actually loaded from the file when self.load_bin_mapper() is called, if and only if command line arguments direct this. If
required
is true, then a mapper must be available at iterationn_iter
, or else an exception will be raised.
- set_other_h5file_info(topology_group, hashval)
Set up to load a bin mapper from (any) open HDF5 file, where bin topologies are stored in
topology_group
(an h5py Group object) and the desired mapper has hash valuehashval
. The mapper itself is loaded when self.load_bin_mapper() is called.
- westpa.cli.tools.w_bins.write_bin_info(mapper, assignments, weights, n_target_states, outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, detailed=False)
Write information about binning to
outfile
, given a mapper (mapper
) and the weights (weights
) and bin assignments (assignments
) of a set of segments, along with a target state count (n_target_states
). Ifdetailed
is true, then per-bin information is written as well as summary information about all bins.
- class westpa.cli.tools.w_bins.WBinTool
Bases:
WESTTool
- prog = 'w_bins'
- description = 'Display information and statistics about binning in a WEST simulation, or\nmodify the binning for the current iteration of a WEST simulation.\n-------------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- cmd_info()
- cmd_rebin()
- westpa.cli.tools.w_bins.entry_point()
w_run
w_run
starts or continues a weighted ensemble simualtion.
Overview
Usage:
w_run [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--oneseg ] [--wm-work-manager WORK_MANAGER]
[--wm-n-workers N_WORKERS] [--wm-zmq-mode MODE]
[--wm-zmq-info INFO_FILE] [--wm-zmq-task-endpoint TASK_ENDPOINT]
[--wm-zmq-result-endpoint RESULT_ENDPOINT]
[--wm-zmq-announce-endpoint ANNOUNCE_ENDPOINT]
[--wm-zmq-heartbeat-interval INTERVAL]
[--wm-zmq-task-timeout TIMEOUT] [--wm-zmq-client-comm-mode MODE]
Command-Line Options
See the command-line tool index for more information on the general options.
Segment Options
- ::
- --oneseg
Only propagate one segment (useful for debugging propagators)
Example
A simple example for using w_run (mostly taken from odld example that is available in the main WESTPA distribution):
w_run &> west.log
This commands starts up a serial weighted ensemble run and pipes the results
into the west.log file. As a side note --debug
option is very useful for
debugging the code if something goes wrong.
westpa.cli.core.w_run module
- westpa.cli.core.w_run.make_work_manager()
Using cues from the environment, instantiate a pre-configured work manager.
- westpa.cli.core.w_run.entry_point()
- westpa.cli.core.w_run.run_simulation()
w_truncate
w_truncate
removes all iterations after a certain point
Overview
Usage:
w_truncate [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-n N_ITER] [-W WEST_H5FILE]
Remove all iterations after a certain point in a
Command-Line Options
See the command-line tool index <command_line_tool_index> for more information on the general options.
Iteration Options
-n N_ITER, --iter N_ITER
Truncate this iteration and those following.
-W WEST_H5FILE, --west-data WEST_H5FILE
PATH of H5 file to truncate. By default, it will read from the RCFILE (e.g., west.cfg).
This option will have override whatever's provided in the RCFILE.
Examples
Running the following will remove iteration 50 and all iterations after 50 from multi.h5.
w_truncate -n 50 -W multi.h5
westpa.cli.core.w_truncate module
- westpa.cli.core.w_truncate.entry_point()
w_fork
usage:
w_fork [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-i INPUT_H5FILE]
[-I N_ITER] [-o OUTPUT_H5FILE] [--istate-map ISTATE_MAP] [--no-headers]
Prepare a new weighted ensemble simulation from an existing one at a particular point. A new HDF5 file is generated. In the case of executable propagation, it is the user’s responsibility to prepare the new simulation directory appropriately, particularly making the old simulation’s restart data from the appropriate iteration available as the new simulations initial state data; a mapping of old simulation segment to new simulation initial states is created, both in the new HDF5 file and as a flat text file, to aid in this. Target states and basis states for the new simulation are taken from those in the original simulation.
optional arguments:
-h, --help show this help message and exit
-i INPUT_H5FILE, --input INPUT_H5FILE
Create simulation from the given INPUT_H5FILE (default: read from configuration
file.
-I N_ITER, --iteration N_ITER
Take initial distribution for new simulation from iteration N_ITER (default:
last complete iteration).
-o OUTPUT_H5FILE, --output OUTPUT_H5FILE
Save new simulation HDF5 file as OUTPUT (default: forked.h5).
--istate-map ISTATE_MAP
Write text file describing mapping of existing segments to new initial states
in ISTATE_MAP (default: istate_map.txt).
--no-headers Do not write header to ISTATE_MAP
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
westpa.cli.tools.w_fork module
- class westpa.cli.core.w_fork.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.cli.core.w_fork.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- westpa.cli.core.w_fork.n_iter_dtype
alias of
uint32
- westpa.cli.core.w_fork.seg_id_dtype
alias of
int64
- westpa.cli.core.w_fork.entry_point()
w_assign
w_assign
uses simulation output to assign walkers to user-specified bins
and macrostates. These assignments are required for some other simulation
tools, namely w_kinetics
and w_kinavg
.
w_assign
supports parallelization (see general work manager options for more on command line options
to specify a work manager).
Overview
Usage:
w_assign [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE] [-o OUTPUT]
[--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION]
[-p MODULE.FUNCTION]
[--states STATEDEF [STATEDEF ...] | --states-from-file STATEFILE | --states-from-function STATEFUNC]
[--wm-work-manager WORK_MANAGER] [--wm-n-workers N_WORKERS]
[--wm-zmq-mode MODE] [--wm-zmq-info INFO_FILE]
[--wm-zmq-task-endpoint TASK_ENDPOINT]
[--wm-zmq-result-endpoint RESULT_ENDPOINT]
[--wm-zmq-announce-endpoint ANNOUNCE_ENDPOINT]
[--wm-zmq-listen-endpoint ANNOUNCE_ENDPOINT]
[--wm-zmq-heartbeat-interval INTERVAL]
[--wm-zmq-task-timeout TIMEOUT]
[--wm-zmq-client-comm-mode MODE]
Command-Line Options
See the general command-line tool reference for more information on the general options.
Input/output Options
-W, --west-data /path/to/file
Read simulation result data from file *file*. (**Default:** The
*hdf5* file specified in the configuration file, by default
**west.h5**)
-o, --output /path/to/file
Write assignment results to file *outfile*. (**Default:** *hdf5*
file **assign.h5**)
Binning Options
Specify how binning is to be assigned to the dataset.:
--bins-from-system
Use binning scheme specified by the system driver; system driver can be
found in the west configuration file, by default named **west.cfg**
(**Default binning**)
--bins-from-expr bin_expr
Use binning scheme specified in *``bin_expr``*, which takes the form a
Python list of lists, where each inner list corresponds to the binning a
given dimension. (for example, "[[0,1,2,4,inf],[-inf,0,inf]]" specifies bin
boundaries for two dimensional progress coordinate. Note that this option
accepts the special symbol 'inf' for floating point infinity
--bins-from-function bin_func
Bins specified by calling an external function *``bin_func``*.
*``bin_func``* should be formatted as '[PATH:]module.function', where the
function 'function' in module 'module' will be used
Macrostate Options
You can optionally specify how to assign user-defined macrostates. Note
that macrostates must be assigned for subsequent analysis tools, namely
w_kinetics
and w_kinavg
.:
--states statedef [statedef ...]
Specify a macrostate for a single bin as *``statedef``*, formatted
as a coordinate tuple where each coordinate specifies the bin to
which it belongs, for instance:
'[1.0, 2.0]' assigns a macrostate corresponding to the bin that
contains the (two-dimensional) progress coordinates 1.0 and 2.0.
Note that a macrostate label can optionally by specified, for
instance: 'bound:[1.0, 2.0]' assigns the corresponding bin
containing the given coordinates the macrostate named 'bound'. Note
that multiple assignments can be specified with this command, but
only one macrostate per bin is possible - if you wish to specify
multiple bins in a single macrostate, use the
*``--states-from-file``* option.
--states-from-file statefile
Read macrostate assignments from *yaml* file *``statefile``*. This
option allows you to assign multiple bins to a single macrostate.
The following example shows the contents of *``statefile``* that
specify two macrostates, bound and unbound, over multiple bins with
a two-dimensional progress coordinate:
---
states:
- label: unbound
coords:
- [9.0, 1.0]
- [9.0, 2.0]
- label: bound
coords:
- [0.1, 0.0]
Specifying Progress Coordinate
By default, progress coordinate information for each iteration is taken from pcoord dataset in the specified input file (which, by default is west.h5). Optionally, you can specify a function to construct the progress coordinate for each iteration - this may be useful to consolidate data from several sources or otherwise preprocess the progress coordinate data.:
--construct-pcoord module.function, -p module.function
Use the function *module.function* to construct the progress
coordinate for each iteration. This will be called once per
iteration as *function(n_iter, iter_group)* and should return an
array indexable as [seg_id][timepoint][dimension]. The
**default** function returns the 'pcoord' dataset for that iteration
(i.e. the function executes return iter_group['pcoord'][...])
Examples
westpa.cli.tools.w_assign module
- westpa.cli.tools.w_assign.seg_id_dtype
alias of
int64
- westpa.cli.tools.w_assign.weight_dtype
alias of
float64
- westpa.cli.tools.w_assign.index_dtype
alias of
uint16
- westpa.cli.tools.w_assign.assign_and_label(nsegs_lb, nsegs_ub, parent_ids, assign, nstates, state_map, last_labels, pcoords, subsample)
Assign trajectories to bins and last-visted macrostates for each timepoint.
- westpa.cli.tools.w_assign.accumulate_labeled_populations(weights, bin_assignments, label_assignments, labeled_bin_pops)
For a set of segments in one iteration, calculate the average population in each bin, with separation by last-visited macrostate.
- class westpa.cli.tools.w_assign.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_assign.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_assign.WESTDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
Tool for synthesizing a dataset for analysis from other datasets. This may be done using a custom function, or a list of “data set specifications”. It is anticipated that if several source datasets are required, then a tool will have multiple instances of this class.
- group_name = 'input dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_assign.BinMappingComponent
Bases:
WESTToolComponent
Component for obtaining a bin mapper from one of several places based on command-line arguments. Such locations include an HDF5 file that contains pickled mappers (including the primary WEST HDF5 file), the system object, an external function, or (in the common case of rectilinear bins) a list of lists of bin boundaries.
Some configuration is necessary prior to calling process_args() if loading a mapper from HDF5. Specifically, either set_we_h5file_info() or set_other_h5file_info() must be called to describe where to find the appropriate mapper. In the case of set_we_h5file_info(), the mapper used for WE at the end of a given iteration will be loaded. In the case of set_other_h5file_info(), an arbitrary group and hash value are specified; the mapper corresponding to that hash in the given group will be returned.
In the absence of arguments, the mapper contained in an existing HDF5 file is preferred; if that is not available, the mapper from the system driver is used.
This component adds the following arguments to argument parsers:
- --bins-from-system
Obtain bins from the system driver
—bins-from-expr=EXPR Construct rectilinear bins by parsing EXPR and calling RectilinearBinMapper() with the result. EXPR must therefore be a list of lists.
- –bins-from-function=[PATH:]MODULE.FUNC
Call an external function FUNC in module MODULE (optionally adding PATH to the search path when loading MODULE) which, when called, returns a fully-constructed bin mapper.
—bins-from-file Load bin definitions from a YAML configuration file.
- --bins-from-h5file
Load bins from the file being considered; this is intended to mean the master WEST HDF5 file or results of other binning calculations, as appropriate.
- add_args(parser, description='binning options', suppress=[])
Add arguments specific to this component to the given argparse parser.
- add_target_count_args(parser, description='bin target count options')
Add options to the given parser corresponding to target counts.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- set_we_h5file_info(n_iter=None, data_manager=None, required=False)
Set up to load a bin mapper from the master WEST HDF5 file. The mapper is actually loaded from the file when self.load_bin_mapper() is called, if and only if command line arguments direct this. If
required
is true, then a mapper must be available at iterationn_iter
, or else an exception will be raised.
- set_other_h5file_info(topology_group, hashval)
Set up to load a bin mapper from (any) open HDF5 file, where bin topologies are stored in
topology_group
(an h5py Group object) and the desired mapper has hash valuehashval
. The mapper itself is loaded when self.load_bin_mapper() is called.
- class westpa.cli.tools.w_assign.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_assign.WESTPAH5File(*args, **kwargs)
Bases:
File
Generalized input/output for WESTPA simulation (or analysis) data.
Create a new file object.
See the h5py user guide for a detailed explanation of the options.
- name
Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.
- mode
r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise
- driver
Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.
- libver
Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’. The ‘v108’, ‘v110’ and ‘v112’ options can only be specified with the HDF5 1.10.2 library or later.
- userblock_size
Desired size of user block. Only allowed when creating a new file (mode w, w- or x).
- swmr
Open the file in SWMR read mode. Only used when mode = ‘r’.
- rdcc_nbytes
Total size of the dataset chunk cache in bytes. The default size is 1024**2 (1 MiB) per dataset. Applies to all datasets unless individually changed.
- rdcc_w0
The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75. Applies to all datasets unless individually changed.
- rdcc_nslots
The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521. Applies to all datasets unless individually changed.
- track_order
Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.
- fs_strategy
The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.
- fs_page_size
File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).
- fs_persist
A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.
- fs_threshold
The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.
- page_buf_size
Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.
- min_meta_keep
Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- min_raw_keep
Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- locking
The file locking behavior. Defined as:
False (or “false”) – Disable file locking
True (or “true”) – Enable file locking
“best-effort” – Enable file locking but ignore some errors
None – Use HDF5 defaults
Warning
The HDF5_USE_FILE_LOCKING environment variable can override this parameter.
Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.
- alignment_threshold
Together with
alignment_interval
, this property ensures that any file object greater than or equal in size to the alignment threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.- alignment_interval
This property should be used in conjunction with
alignment_threshold
. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT- meta_block_size
Set the current minimum size, in bytes, of new metadata block allocations. See https://portal.hdfgroup.org/display/HDF5/H5P_SET_META_BLOCK_SIZE
- Additional keywords
Passed on to the selected file driver.
- default_iter_prec = 8
- replace_dataset(*args, **kwargs)
- iter_object_name(n_iter, prefix='', suffix='')
Return a properly-formatted per-iteration name for iteration
n_iter
. (This is used in create/require/get_iter_group, but may also be useful for naming datasets on a per-iteration basis.)
- create_iter_group(n_iter, group=None)
Create a per-iteration data storage group for iteration number
n_iter
in the groupgroup
(which is ‘/iterations’ by default).
- require_iter_group(n_iter, group=None)
Ensure that a per-iteration data storage group for iteration number
n_iter
is available in the groupgroup
(which is ‘/iterations’ by default).
- get_iter_group(n_iter, group=None)
Get the per-iteration data group for iteration number
n_iter
from within the groupgroup
(‘/iterations’ by default).
- westpa.cli.tools.w_assign.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- westpa.cli.tools.w_assign.parse_pcoord_value(pc_str)
- class westpa.cli.tools.w_assign.WAssign
Bases:
WESTParallelTool
- prog = 'w_assign'
- description = 'Assign walkers to bins, producing a file (by default named "assign.h5")\nwhich can be used in subsequent analysis.\n\nFor consistency in subsequent analysis operations, the entire dataset\nmust be assigned, even if only a subset of the data will be used. This\nensures that analyses that rely on tracing trajectories always know the\noriginating bin of each trajectory.\n\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is provided either by a user-specified function\n(--construct-dataset) or a list of "data set specifications" (--dsspecs).\nIf neither is provided, the progress coordinate dataset \'\'pcoord\'\' is used.\n\nTo use a custom function to extract or calculate data whose probability\ndistribution will be calculated, specify the function in standard Python\nMODULE.FUNCTION syntax as the argument to --construct-dataset. This function\nwill be called as function(n_iter,iter_group), where n_iter is the iteration\nwhose data are being considered and iter_group is the corresponding group\nin the main WEST HDF5 file (west.h5). The function must return data which can\nbe indexed as [segment][timepoint][dimension].\n\nTo use a list of data set specifications, specify --dsspecs and then list the\ndesired datasets one-by-one (space-separated in most shells). These data set\nspecifications are formatted as NAME[,file=FILENAME,slice=SLICE], which will\nuse the dataset called NAME in the HDF5 file FILENAME (defaulting to the main\nWEST HDF5 file west.h5), and slice it with the Python slice expression SLICE\n(as in [0:2] to select the first two elements of the first axis of the\ndataset). The ``slice`` option is most useful for selecting one column (or\nmore) from a multi-column dataset, such as arises when using a progress\ncoordinate of multiple dimensions.\n\n\n-----------------------------------------------------------------------------\nSpecifying macrostates\n-----------------------------------------------------------------------------\n\nOptionally, kinetic macrostates may be defined in terms of sets of bins.\nEach trajectory will be labeled with the kinetic macrostate it was most\nrecently in at each timepoint, for use in subsequent kinetic analysis.\nThis is required for all kinetics analysis (w_kintrace and w_kinmat).\n\nThere are three ways to specify macrostates:\n\n 1. States corresponding to single bins may be identified on the command\n line using the --states option, which takes multiple arguments, one for\n each state (separated by spaces in most shells). Each state is specified\n as a coordinate tuple, with an optional label prepended, as in\n ``bound:1.0`` or ``unbound:(2.5,2.5)``. Unlabeled states are named\n ``stateN``, where N is the (zero-based) position in the list of states\n supplied to --states.\n\n 2. States corresponding to multiple bins may use a YAML input file specified\n with --states-from-file. This file defines a list of states, each with a\n name and a list of coordinate tuples; bins containing these coordinates\n will be mapped to the containing state. For instance, the following\n file::\n\n ---\n states:\n - label: unbound\n coords:\n - [9.0, 1.0]\n - [9.0, 2.0]\n - label: bound\n coords:\n - [0.1, 0.0]\n\n produces two macrostates: the first state is called "unbound" and\n consists of bins containing the (2-dimensional) progress coordinate\n values (9.0, 1.0) and (9.0, 2.0); the second state is called "bound"\n and consists of the single bin containing the point (0.1, 0.0).\n\n 3. Arbitrary state definitions may be supplied by a user-defined function,\n specified as --states-from-function=MODULE.FUNCTION. This function is\n called with the bin mapper as an argument (``function(mapper)``) and must\n return a list of dictionaries, one per state. Each dictionary must contain\n a vector of coordinate tuples with key "coords"; the bins into which each\n of these tuples falls define the state. An optional name for the state\n (with key "label") may also be provided.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "assign.h5") contains the following\nattributes datasets:\n\n ``nbins`` attribute\n *(Integer)* Number of valid bins. Bin assignments range from 0 to\n *nbins*-1, inclusive.\n\n ``nstates`` attribute\n *(Integer)* Number of valid macrostates (may be zero if no such states are\n specified). Trajectory ensemble assignments range from 0 to *nstates*-1,\n inclusive, when states are defined.\n\n ``/assignments`` [iteration][segment][timepoint]\n *(Integer)* Per-segment and -timepoint assignments (bin indices).\n\n ``/npts`` [iteration]\n *(Integer)* Number of timepoints in each iteration.\n\n ``/nsegs`` [iteration]\n *(Integer)* Number of segments in each iteration.\n\n ``/labeled_populations`` [iterations][state][bin]\n *(Floating-point)* Per-iteration and -timepoint bin populations, labeled\n by most recently visited macrostate. The last state entry (*nstates-1*)\n corresponds to trajectories initiated outside of a defined macrostate.\n\n ``/bin_labels`` [bin]\n *(String)* Text labels of bins.\n\nWhen macrostate assignments are given, the following additional datasets are\npresent:\n\n ``/trajlabels`` [iteration][segment][timepoint]\n *(Integer)* Per-segment and -timepoint trajectory labels, indicating the\n macrostate which each trajectory last visited.\n\n ``/state_labels`` [state]\n *(String)* Labels of states.\n\n ``/state_map`` [bin]\n *(Integer)* Mapping of bin index to the macrostate containing that bin.\n An entry will contain *nbins+1* if that bin does not fall into a\n macrostate.\n\nDatasets indexed by state and bin contain one more entry than the number of\nvalid states or bins. For *N* bins, axes indexed by bin are of size *N+1*, and\nentry *N* (0-based indexing) corresponds to a walker outside of the defined bin\nspace (which will cause most mappers to raise an error). More importantly, for\n*M* states (including the case *M=0* where no states are specified), axes\nindexed by state are of size *M+1* and entry *M* refers to trajectories\ninitiated in a region not corresponding to a defined macrostate.\n\nThus, ``labeled_populations[:,:,:].sum(axis=1)[:,:-1]`` gives overall per-bin\npopulations, for all defined bins and\n``labeled_populations[:,:,:].sum(axis=2)[:,:-1]`` gives overall\nper-trajectory-ensemble populations for all defined states.\n\n\n-----------------------------------------------------------------------------\nParallelization\n-----------------------------------------------------------------------------\n\nThis tool supports parallelized binning, including reading/calculating input\ndata.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- parse_cmdline_states(state_strings)
- load_config_from_west(scheme)
- load_state_file(state_filename)
- states_from_dict(ystates)
- load_states_from_function(statefunc)
- assign_iteration(n_iter, nstates, nbins, state_map, last_labels)
Method to encapsulate the segment slicing (into n_worker slices) and parallel job submission Submits job(s), waits on completion, splices them back together Returns: assignments, trajlabels, pops for this iteration
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_assign.entry_point()
w_trace
usage:
w_trace [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-W WEST_H5FILE]
[-d DSNAME] [--output-pattern OUTPUT_PATTERN] [-o OUTPUT]
N_ITER:SEG_ID [N_ITER:SEG_ID ...]
Trace individual WEST trajectories and emit (or calculate) quantities along the trajectory.
Trajectories are specified as N_ITER:SEG_ID pairs. Each segment is traced back to its initial point, and then various quantities (notably n_iter and seg_id) are printed in order from initial point up until the given segment in the given iteration.
Output is stored in several files, all named according to the pattern given by the -o/–output-pattern parameter. The default output pattern is “traj_%d_%d”, where the printf-style format codes are replaced by the iteration number and segment ID of the terminal segment of the trajectory being traced.
Individual datasets can be selected for writing using the -d/--dataset
option
(which may be specified more than once). The simplest form is -d dsname
,
which causes data from dataset dsname
along the trace to be stored to
HDF5. The dataset is assumed to be stored on a per-iteration basis, with
the first dimension corresponding to seg_id and the second dimension
corresponding to time within the segment. Further options are specified
as comma-separated key=value pairs after the data set name, as in:
-d dsname,alias=newname,index=idsname,file=otherfile.h5,slice=[100,...]
The following options for datasets are supported:
alias=newname
When writing this data to HDF5 or text files, use ``newname``
instead of ``dsname`` to identify the dataset. This is mostly of
use in conjunction with the ``slice`` option in order, e.g., to
retrieve two different slices of a dataset and store then with
different names for future use.
index=idsname
The dataset is not stored on a per-iteration basis for all
segments, but instead is stored as a single dataset whose
first dimension indexes n_iter/seg_id pairs. The index to
these n_iter/seg_id pairs is ``idsname``.
file=otherfile.h5
Instead of reading data from the main WEST HDF5 file (usually
``west.h5``), read data from ``otherfile.h5``.
slice=[100,...]
Retrieve only the given slice from the dataset. This can be
used to pick a subset of interest to minimize I/O.
positional arguments
N_ITER:SEG_ID Trace trajectory ending (or at least alive at) N_ITER:SEG_ID.
optional arguments
-h, --help show this help message and exit
-d DSNAME, --dataset DSNAME
Include the dataset named DSNAME in trace output. An extended form like
DSNAME[,alias=ALIAS][,index=INDEX][,file=FILE][,slice=SLICE] will obtain the
dataset from the given FILE instead of the main WEST HDF5 file, slice it by
SLICE, call it ALIAS in output, and/or access per-segment data by a
n_iter,seg_id INDEX instead of a seg_id indexed dataset in the group for
n_iter.
general options
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
WEST input data options
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
output options
--output-pattern OUTPUT_PATTERN
Write per-trajectory data to output files/HDF5 groups whose names begin with
OUTPUT_PATTERN, which must contain two printf-style format flags which will be
replaced with the iteration number and segment ID of the terminal segment of
the trajectory being traced. (Default: traj_%d_%d.)
-o OUTPUT, --output OUTPUT
Store intermediate data and analysis results to OUTPUT (default: trajs.h5).
westpa.cli.tools.w_trace module
- class westpa.cli.tools.w_trace.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- class westpa.cli.tools.w_trace.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_trace.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.cli.tools.w_trace.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- westpa.cli.tools.w_trace.weight_dtype
alias of
float64
- westpa.cli.tools.w_trace.n_iter_dtype
alias of
uint32
- westpa.cli.tools.w_trace.seg_id_dtype
alias of
int64
- westpa.cli.tools.w_trace.utime_dtype
alias of
float64
- class westpa.cli.tools.w_trace.Trace(summary, endpoint_type, basis_state, initial_state, data_manager=None)
Bases:
object
A class representing a trace of a certain trajectory segment back to its origin.
- classmethod from_data_manager(n_iter, seg_id, data_manager=None)
Construct and return a trajectory trace whose last segment is identified by
seg_id
in the iteration numbern_iter
.
- get_segment_data_slice(datafile, dsname, n_iter, seg_id, slice_=None, index_data=None, iter_prec=None)
Return the data from the dataset named
dsname
within the givendatafile
(an open h5py.File object) for the given iteration and segment. By default, it is assumed that the dataset is stored in the iteration group for iterationn_iter
, but ifindex_data
is provided, it must be an iterable (preferably a simple array) of (n_iter,seg_id) pairs, and the index in theindex_data
iterable of the matching n_iter/seg_id pair is used as the index of the data to retrieve.If an optional
slice_
is provided, then the given slicing tuple is appended to that used to retrieve the segment-specific data (i.e. it can be used to pluck a subset of the data that would otherwise be returned).
- trace_timepoint_dataset(dsname, slice_=None, auxfile=None, index_ds=None)
Return a trace along this trajectory over a dataset which is layed out as [seg_id][timepoint][…]. Overlapping values at segment boundaries are accounted for. Returns (data_trace, weight), where data_trace is a time series of the dataset along this trajectory, and weight is the corresponding trajectory weight at each time point.
If
auxfile
is given, then load the dataset from the given HDF5 file, which must be layed out the same way as the main HDF5 file (e.g. iterations arranged as iterations/iter_*).If index_ds is given, instead of reading data per-iteration from iter_* groups, then the given index_ds is used as an index of n_iter,seg_id pairs into
dsname
. In this case, the target data set need not exist on a per-iteration basis inside iter_* groups.If
slice_
is given, then further slice the data returned from the HDF5 dataset. This can minimize I/O if it is known (and specified) that only a subset of the data along the trajectory is needed.
- class westpa.cli.tools.w_trace.WTraceTool
Bases:
WESTTool
- prog = 'w_trace'
- description = 'Trace individual WEST trajectories and emit (or calculate) quantities along the\ntrajectory.\n\nTrajectories are specified as N_ITER:SEG_ID pairs. Each segment is traced back\nto its initial point, and then various quantities (notably n_iter and seg_id)\nare printed in order from initial point up until the given segment in the given\niteration.\n\nOutput is stored in several files, all named according to the pattern given by\nthe -o/--output-pattern parameter. The default output pattern is "traj_%d_%d",\nwhere the printf-style format codes are replaced by the iteration number and\nsegment ID of the terminal segment of the trajectory being traced.\n\nIndividual datasets can be selected for writing using the -d/--dataset option\n(which may be specified more than once). The simplest form is ``-d dsname``,\nwhich causes data from dataset ``dsname`` along the trace to be stored to\nHDF5. The dataset is assumed to be stored on a per-iteration basis, with\nthe first dimension corresponding to seg_id and the second dimension\ncorresponding to time within the segment. Further options are specified\nas comma-separated key=value pairs after the data set name, as in\n\n -d dsname,alias=newname,index=idsname,file=otherfile.h5,slice=[100,...]\n\nThe following options for datasets are supported:\n\n alias=newname\n When writing this data to HDF5 or text files, use ``newname``\n instead of ``dsname`` to identify the dataset. This is mostly of\n use in conjunction with the ``slice`` option in order, e.g., to\n retrieve two different slices of a dataset and store then with\n different names for future use.\n\n index=idsname\n The dataset is not stored on a per-iteration basis for all\n segments, but instead is stored as a single dataset whose\n first dimension indexes n_iter/seg_id pairs. The index to\n these n_iter/seg_id pairs is ``idsname``.\n\n file=otherfile.h5\n Instead of reading data from the main WEST HDF5 file (usually\n ``west.h5``), read data from ``otherfile.h5``.\n\n slice=[100,...]\n Retrieve only the given slice from the dataset. This can be\n used to pick a subset of interest to minimize I/O.\n\n-------------------------------------------------------------------------------\n'
- pcoord_formats = {'f4': '%14.7g', 'f8': '%023.15g', 'i2': '%6d', 'i4': '%11d', 'i8': '%20d', 'u2': '%5d', 'u4': '%10d', 'u8': '%20d'}
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- parse_dataset_string(dsstr)
- go()
Perform the analysis associated with this tool.
- emit_trace_h5(trace, output_group)
- emit_trace_text(trace, output_file)
Dump summary information about each segment in the given trace to the given output_file, which must be opened for writing in text mode. Output columns are separated by at least one space.
- westpa.cli.tools.w_trace.entry_point()
w_ipa
The w_ipa
is a (beta) WESTPA tool that automates analysis using analysis schemes and enables interactive analysis of WESTPA simulation data. The tool can do a variety of different types of analysis, including the following:
* Calculate fluxes and rate constants
* Adjust and use alternate state definitions
* Trace trajectory segments, including statistical weights, position along the progress coordinate, and other auxiliary data
* Plot all of the above in the terminal!
If you are using w_ipa
for kinetics automated kinetics analysis, keep in mind that w_ipa
is running w_assign
and w_direct
using the scheme designated in your west.cfg
file. For more diverse kinetics analysis options, consider using w_assign
and w_direct
manually. This can be useful if you’d like to use auxiliary coordinates that aren’t your progress coordinate, in one or two dimension options.
usage:
w_ipa [-h] [-r RCFILE] [--quiet] [--verbose] [--version] [--max-queue-length MAX_QUEUE_LENGTH]
[-W WEST_H5FILE] [--analysis-only] [--reanalyze] [--ignore-hash] [--debug] [--terminal]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
optional arguments:
-h, --help show this help message and exit
- general options:
- -r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
- --quiet
emit only essential information
- --verbose
emit extra information
- --version
show program’s version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that
have very large requests/response. Default: no limit.
- WEST input data options:
- -W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in west.cfg).
runtime options:
--analysis-only, -ao Use this flag to run the analysis and return to the terminal.
--reanalyze, -ra Use this flag to delete the existing files and reanalyze.
--ignore-hash, -ih Ignore hash and don't regenerate files.
--debug, -d Debug output largely intended for development.
--terminal, -t Plot output in terminal.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'processes'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
westpa.cli.tools.w_ipa module
- class westpa.cli.tools.w_ipa.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_ipa.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_ipa.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_ipa.Plotter(h5file, h5key, iteration=-1, interface='matplotlib')
Bases:
object
This is a semi-generic plotting interface that has a built in curses based terminal plotter. It’s fairly specific to what we’re using it for here, but we could (and maybe should) build it out into a little library that we can use via the command line to plot things. Might be useful for looking at data later. That would also cut the size of this tool down by a good bit.
- plot(i=0, j=1, tau=1, iteration=None, dim=0, interface=None)
- class westpa.cli.tools.w_ipa.WIPIScheme(scheme, name, parent, settings)
Bases:
object
- property scheme
- property list_schemes
Lists what schemes are configured in west.cfg file. Schemes should be structured as follows, in west.cfg:
- west:
- system:
- analysis:
directory: analysis analysis_schemes:
- scheme.1:
enabled: True states:
label: unbound coords: [[7.0]]
label: bound coords: [[2.7]]
- bins:
type: RectilinearBinMapper boundaries: [[0.0, 2.80, 7, 10000]]
- property iteration
- property assign
- property direct
The output from w_direct.py from the current scheme.
- property state_labels
- property bin_labels
- property west
- property reweight
- property current
The current iteration. See help for __get_data_for_iteration__
- property past
The previous iteration. See help for __get_data_for_iteration__
- class westpa.cli.tools.w_ipa.WIPI
Bases:
WESTParallelTool
Welcome to w_ipa (WESTPA Interactive Python Analysis)! From here, you can run traces, look at weights, progress coordinates, etc. This is considered a ‘stateful’ tool; that is, the data you are pulling is always pulled from the current analysis scheme and iteration. By default, the first analysis scheme in west.cfg is used, and you are set at iteration 1.
ALL PROPERTIES ARE ACCESSED VIA w or west To see the current iteration, try:
w.iteration OR west.iteration
to set it, simply plug in a new value.
w.iteration = 100
To change/list the current analysis schemes:
w.list_schemes w.scheme = OUTPUT FROM w.list_schemes
To see the states and bins defined in the current analysis scheme:
w.states w.bin_labels
All information about the current iteration is available in an object called ‘current’:
w.current walkers, summary, states, seg_id, weights, parents, kinavg, pcoord, bins, populations, and auxdata, if it exists.
In addition, the function w.trace(seg_id) will run a trace over a seg_id in the current iteration and return a dictionary containing all pertinent information about that seg_id’s history. It’s best to store this, as the trace can be expensive.
Run help on any function or property for more information!
Happy analyzing!
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- hash_args(args, extra=None, path=None)
Create unique hash stamp to determine if arguments/file is different from before.
- stamp_hash(h5file_name, new_hash)
Loads a file, stamps it, and returns the opened file in read only
- analysis_structure()
Run automatically on startup. Parses through the configuration file, and loads up all the data files from the different analysis schematics. If they don’t exist, it creates them automatically by hooking in to existing analysis routines and going from there.
It does this by calling in the make_parser_and_process function for w_{assign,reweight,direct} using a custom built list of args. The user can specify everything in the configuration file that would have been specified on the command line.
For instance, were one to call w_direct as follows:
w_direct –evolution cumulative –step-iter 1 –disable-correl
the west.cfg would look as follows:
- west:
- analysis:
- w_direct:
evolution: cumulative step_iter: 1 extra: [‘disable-correl’]
Alternatively, if one wishes to use the same options for both w_direct and w_reweight, the key ‘w_direct’ can be replaced with ‘kinetics’.
- property assign
- property direct
The output from w_kinavg.py from the current scheme.
- property state_labels
- property bin_labels
- property west
- property reweight
- property scheme
Returns and sets what scheme is currently in use. To see what schemes are available, run:
w.list_schemes
- property list_schemes
Lists what schemes are configured in west.cfg file. Schemes should be structured as follows, in west.cfg:
- west:
- system:
- analysis:
directory: analysis analysis_schemes:
- scheme.1:
enabled: True states:
label: unbound coords: [[7.0]]
label: bound coords: [[2.7]]
- bins:
type: RectilinearBinMapper boundaries: [[0.0, 2.80, 7, 10000]]
- property iteration
Returns/sets the current iteration.
- property current
The current iteration. See help for __get_data_for_iteration__
- property past
The previous iteration. See help for __get_data_for_iteration__
- trace(seg_id)
Runs a trace on a seg_id within the current iteration, all the way back to the beginning, returning a dictionary containing all interesting information:
seg_id, pcoord, states, bins, weights, iteration, auxdata (optional)
sorted in chronological order.
Call with a seg_id.
- property future
Similar to current/past, but keyed differently and returns different datasets. See help for Future.
- class Future(raw, key)
Bases:
WIPIDataset
- go()
Function automatically called by main() when launched via the command line interface. Generally, call main, not this function.
- property introduction
Just spits out an introduction, in case someone doesn’t call help.
- property help
Just a minor function to call help on itself. Only in here to really help someone get help.
- westpa.cli.tools.w_ipa.entry_point()
w_pdist
w_pdist
constructs and calculates the progress coordinate probability
distribution’s evolution over a user-specified number of simulation iterations.
w_pdist
supports progress coordinates with dimensionality ≥ 1.
The resulting distribution can be viewed with the plothist tool.
Overview
Usage:
w_pdist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE] [--first-iter N_ITER] [--last-iter N_ITER]
[-b BINEXPR] [-o OUTPUT]
[--construct-dataset CONSTRUCT_DATASET | --dsspecs DSSPEC [DSSPEC ...]]
[--serial | --parallel | --work-manager WORK_MANAGER]
[--n-workers N_WORKERS] [--zmq-mode MODE]
[--zmq-info INFO_FILE] [--zmq-task-endpoint TASK_ENDPOINT]
[--zmq-result-endpoint RESULT_ENDPOINT]
[--zmq-announce-endpoint ANNOUNCE_ENDPOINT]
[--zmq-listen-endpoint ANNOUNCE_ENDPOINT]
[--zmq-heartbeat-interval INTERVAL]
[--zmq-task-timeout TIMEOUT] [--zmq-client-comm-mode MODE]
Note: This tool supports parallelization, which may be more efficient for especially large datasets.
Command-Line Options
See the general command-line tool reference for more information on the general options.
Input/output options
These arguments allow the user to specify where to read input simulation result data and where to output calculated progress coordinate probability distribution data.
Both input and output files are hdf5 format:
-W, --WEST_H5FILE file
Read simulation result data from file *file*. (**Default:** The
*hdf5* file specified in the configuration file (default config file
is *west.h5*))
-o, --output file
Store this tool's output in *file*. (**Default:** The *hdf5* file
**pcpdist.h5**)
Iteration range options
Specify the range of iterations over which to construct the progress coordinate probability distribution.:
--first-iter n_iter
Construct probability distribution starting with iteration *n_iter*
(**Default:** 1)
--last-iter n_iter
Construct probability distribution's time evolution up to (and
including) iteration *n_iter* (**Default:** Last completed
iteration)
Probability distribution binning options
Specify the number of bins to use when constructing the progress coordinate probability distribution. If using a multidimensional progress coordinate, different binning schemes can be used for the probability distribution for each progress coordinate.:
-b binexpr
*binexpr* specifies the number and formatting of the bins. Its
format can be as follows:
1. an integer, in which case all distributions have that many
equal sized bins
2. a python-style list of integers, of length corresponding to
the number of dimensions of the progress coordinate, in which
case each progress coordinate's probability distribution has the
corresponding number of bins
3. a python-style list of lists of scalars, where the list at
each index corresponds to each dimension of the progress
coordinate and specifies specific bin boundaries for that
progress coordinate's probability distribution.
(**Default:** 100 bins for all progress coordinates)
Examples
Assuming simulation results are stored in west.h5 (which is specified in the configuration file named west.cfg), for a simulation with a 1-dimensional progress coordinate:
Calculate a probability distribution histogram using all default options (output file: pdist.h5; histogram binning: 100 equal sized bins; probability distribution over the lowest reached progress coordinate to the largest; work is parallelized over all available local cores using the ‘processes’ work manager):
w_pdist
Same as above, except using the serial work manager (which may be more efficient for smaller datasets):
w_pdist --serial
westpa.cli.tools.w_pdist module
- class westpa.cli.tools.w_pdist.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_pdist.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_pdist.WESTDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
Tool for synthesizing a dataset for analysis from other datasets. This may be done using a custom function, or a list of “data set specifications”. It is anticipated that if several source datasets are required, then a tool will have multiple instances of this class.
- group_name = 'input dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_pdist.WESTWDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
- group_name = 'weight dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_pdist.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- class westpa.cli.tools.w_pdist.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- westpa.cli.tools.w_pdist.histnd(values, binbounds, weights=1.0, out=None, binbound_check=True, ignore_out_of_range=False)
Generate an N-dimensional PDF (or contribution to a PDF) from the given values.
binbounds
is a list of arrays of boundary values, with one entry for each dimension (values
must have as many columns as there are entries inbinbounds
)weight
, if provided, specifies the weight each value contributes to the histogram; this may be a scalar (for equal weights for all values) or a vector of the same length asvalues
(for unequal weights). Ifbinbound_check
is True, then the boundaries are checked for strict positive monotonicity; set to False to shave a few microseconds if you know your bin boundaries to be monotonically increasing.
- westpa.cli.tools.w_pdist.normhistnd(hist, binbounds)
Normalize the N-dimensional histogram
hist
with corresponding bin boundariesbinbounds
. Modifieshist
in place and returns the normalization factor used.
- westpa.cli.tools.w_pdist.isiterable(x)
- class westpa.cli.tools.w_pdist.WPDist
Bases:
WESTParallelTool
- prog = 'w_pdist'
- description = 'Calculate time-resolved, multi-dimensional probability distributions of WE\ndatasets.\n\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is provided either by a user-specified function\n(--construct-dataset) or a list of "data set specifications" (--dsspecs).\nIf neither is provided, the progress coordinate dataset \'\'pcoord\'\' is used.\n\nTo use a custom function to extract or calculate data whose probability\ndistribution will be calculated, specify the function in standard Python\nMODULE.FUNCTION syntax as the argument to --construct-dataset. This function\nwill be called as function(n_iter,iter_group), where n_iter is the iteration\nwhose data are being considered and iter_group is the corresponding group\nin the main WEST HDF5 file (west.h5). The function must return data which can\nbe indexed as [segment][timepoint][dimension].\n\nTo use a list of data set specifications, specify --dsspecs and then list the\ndesired datasets one-by-one (space-separated in most shells). These data set\nspecifications are formatted as NAME[,file=FILENAME,slice=SLICE], which will\nuse the dataset called NAME in the HDF5 file FILENAME (defaulting to the main\nWEST HDF5 file west.h5), and slice it with the Python slice expression SLICE\n(as in [0:2] to select the first two elements of the first axis of the\ndataset). The ``slice`` option is most useful for selecting one column (or\nmore) from a multi-column dataset, such as arises when using a progress\ncoordinate of multiple dimensions.\n\n\n-----------------------------------------------------------------------------\nHistogram binning\n-----------------------------------------------------------------------------\n\nBy default, histograms are constructed with 100 bins in each dimension. This\ncan be overridden by specifying -b/--bins, which accepts a number of different\nkinds of arguments:\n\n a single integer N\n N uniformly spaced bins will be used in each dimension.\n\n a sequence of integers N1,N2,... (comma-separated)\n N1 uniformly spaced bins will be used for the first dimension, N2 for the\n second, and so on.\n\n a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]\n The bin boundaries B11, B12, B13, ... will be used for the first dimension,\n B21, B22, B23, ... for the second dimension, and so on. These bin\n boundaries need not be uniformly spaced. These expressions will be\n evaluated with Python\'s ``eval`` construct, with ``np`` available for\n use [e.g. to specify bins using np.arange()].\n\nThe first two forms (integer, list of integers) will trigger a scan of all\ndata in each dimension in order to determine the minimum and maximum values,\nwhich may be very expensive for large datasets. This can be avoided by\nexplicitly providing bin boundaries using the list-of-lists form.\n\nNote that these bins are *NOT* at all related to the bins used to drive WE\nsampling.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file produced (specified by -o/--output, defaulting to "pdist.h5")\nmay be fed to plothist to generate plots (or appropriately processed text or\nHDF5 files) from this data. In short, the following datasets are created:\n\n ``histograms``\n Normalized histograms. The first axis corresponds to iteration, and\n remaining axes correspond to dimensions of the input dataset.\n\n ``/binbounds_0``\n Vector of bin boundaries for the first (index 0) dimension. Additional\n datasets similarly named (/binbounds_1, /binbounds_2, ...) are created\n for additional dimensions.\n\n ``/midpoints_0``\n Vector of bin midpoints for the first (index 0) dimension. Additional\n datasets similarly named are created for additional dimensions.\n\n ``n_iter``\n Vector of iteration numbers corresponding to the stored histograms (i.e.\n the first axis of the ``histograms`` dataset).\n\n\n-----------------------------------------------------------------------------\nSubsequent processing\n-----------------------------------------------------------------------------\n\nThe output generated by this program (-o/--output, default "pdist.h5") may be\nplotted by the ``plothist`` program. See ``plothist --help`` for more\ninformation.\n\n\n-----------------------------------------------------------------------------\nParallelization\n-----------------------------------------------------------------------------\n\nThis tool supports parallelized binning, including reading of input data.\nParallel processing is the default. For simple cases (reading pre-computed\ninput data, modest numbers of segments), serial processing (--serial) may be\nmore efficient.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- static parse_binspec(binspec)
- construct_bins(bins)
Construct bins according to
bins
, which may be:A scalar integer (for that number of bins in each dimension)
A sequence of integers (specifying number of bins for each dimension)
A sequence of sequences of bin boundaries (specifying boundaries for each dimension)
Sets
self.binbounds
to a list of arrays of bin boundaries appropriate for passing to fasthist.histnd, along withself.midpoints
to the midpoints of the bins.
- scan_data_shape()
- scan_data_range()
Scan input data for range in each dimension. The number of dimensions is determined from the shape of the progress coordinate as of self.iter_start.
- construct_histogram()
Construct a histogram using bins previously constructed with
construct_bins()
. The time series of histogram values is stored inhistograms
. Each histogram in the time series is normalized.
- westpa.cli.tools.w_pdist.entry_point()
w_succ
usage:
w_succ [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-A H5FILE] [-W WEST_H5FILE]
[-o OUTPUT_FILE]
List segments which successfully reach a target state.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT_FILE, --output OUTPUT_FILE
Store output in OUTPUT_FILE (default: write to standard output).
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
general analysis options:
-A H5FILE, --analysis-file H5FILE
Store intermediate and final results in H5FILE (default: analysis.h5).
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
westpa.cli.core.w_succ module
- class westpa.cli.core.w_succ.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.cli.core.w_succ.WESTAnalysisTool
Bases:
object
- add_args(parser, upcall=True)
Add arguments to a parser common to all analyses of this type.
- process_args(args, upcall=True)
- open_analysis_backing()
- close_analysis_backing()
- require_analysis_group(groupname, replace=False)
- class westpa.cli.core.w_succ.WESTDataReaderMixin
Bases:
AnalysisMixin
A mixin for analysis requiring access to the HDF5 files generated during a WEST run.
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- clear_run_cache()
- property cache_pcoords
Whether or not to cache progress coordinate data. While caching this data can significantly speed up some analysis operations, this requires copious RAM.
Setting this to False when it was formerly True will release any cached data.
- get_summary_table()
- get_iter_group(n_iter)
Return the HDF5 group corresponding to
n_iter
- get_segments(n_iter, include_pcoords=True)
Return all segments present in iteration n_iter
- get_segments_by_id(n_iter, seg_ids, include_pcoords=True)
Get segments from the data manager, employing caching where possible
- get_children(segment, include_pcoords=True)
- get_seg_index(n_iter)
- get_wtg_parent_array(n_iter)
- get_parent_array(n_iter)
- get_pcoord_array(n_iter)
- get_pcoord_dataset(n_iter)
- get_pcoords(n_iter, seg_ids)
- get_seg_ids(n_iter, bool_array=None)
- get_created_seg_ids(n_iter)
Return a list of seg_ids corresponding to segments which were created for the given iteration (are not continuations).
- max_iter_segs_in_range(first_iter, last_iter)
Return the maximum number of segments present in any iteration in the range selected
- total_segs_in_range(first_iter, last_iter)
Return the total number of segments present in all iterations in the range selected
- get_pcoord_len(n_iter)
Get the length of the progress coordinate array for the given iteration.
- get_total_time(first_iter=None, last_iter=None, dt=None)
Return the total amount of simulation time spanned between first_iter and last_iter (inclusive).
- class westpa.cli.core.w_succ.CommonOutputMixin
Bases:
AnalysisMixin
- add_common_output_args(parser_or_group)
- process_common_output_args(args)
- class westpa.cli.core.w_succ.WSucc
Bases:
CommonOutputMixin
,WESTDataReaderMixin
,WESTAnalysisTool
- find_successful_trajs()
- westpa.cli.core.w_succ.entry_point()
w_crawl
usage:
w_crawl [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-W WEST_H5FILE] [--first-iter N_ITER]
[--last-iter N_ITER] [-c CRAWLER_INSTANCE]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
task_callable
Crawl a weighted ensemble dataset, executing a function for each iteration. This can be used for postprocessing of trajectories, cleanup of datasets, or anything else that can be expressed as “do X for iteration N, then do something with the result”. Tasks are parallelized by iteration, and no guarantees are made about evaluation order.
Command-line options
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks
that have very large requests/response. Default: no limit.
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
task options:
-c CRAWLER_INSTANCE, --crawler-instance CRAWLER_INSTANCE
Use CRAWLER_INSTANCE (specified as module.instance) as an instance of
WESTPACrawler to coordinate the calculation. Required only if initialization,
finalization, or task result processing is required.
task_callable Run TASK_CALLABLE (specified as module.function) on each iteration. Required.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work
managers are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option.
Use 0 for a dedicated server. (Ignored by work managers which do not support
this option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a
deprecated synonym for "master" and "client" is a deprecated synonym for
"node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g.
/tmp); on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read
this file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting
in coordinating the communication of other nodes to choose ports randomly,
writing that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic
toward the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result)
traffic from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker
in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
westpa.cli.tools.w_crawl module
- class westpa.cli.tools.w_crawl.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_crawl.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_crawl.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- class westpa.cli.tools.w_crawl.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- westpa.cli.tools.w_crawl.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- class westpa.cli.tools.w_crawl.WESTPACrawler
Bases:
object
Base class for general crawling execution. This class only exists on the master.
- initialize(iter_start, iter_stop)
Initialize this crawling process.
- finalize()
Finalize this crawling process.
- process_iter_result(n_iter, result)
Process the result of a per-iteration task.
- class westpa.cli.tools.w_crawl.WCrawl
Bases:
WESTParallelTool
- prog = 'w_crawl'
- description = 'Crawl a weighted ensemble dataset, executing a function for each iteration.\nThis can be used for postprocessing of trajectories, cleanup of datasets,\nor anything else that can be expressed as "do X for iteration N, then do\nsomething with the result". Tasks are parallelized by iteration, and\nno guarantees are made about evaluation order.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_crawl.entry_point()
w_direct
usage:
w_direct [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
{help,init,average,kinetics,probs,all} ...
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that
have very large requests/response. Default: no limit.
direct kinetics analysis schemes:
{help,init,average,kinetics,probs,all}
help print help for this command or individual subcommands
init calculate state-to-state kinetics by tracing trajectories
average Averages and returns fluxes, rates, and color/state populations.
kinetics Generates rate and flux values from a WESTPA simulation via tracing.
probs Calculates color and state probabilities via tracing.
all Runs the full suite, including the tracing of events.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
westpa.cli.tools.w_direct module
- westpa.cli.tools.w_direct.weight_dtype
alias of
float64
- class westpa.cli.tools.w_direct.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.w_direct.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- westpa.cli.tools.w_direct.sequence_macro_flux_to_rate(dataset, pops, istate, jstate, pairwise=True, stride=None)
Convert a sequence of macrostate fluxes and corresponding list of trajectory ensemble populations to a sequence of rate matrices.
If the optional
pairwise
is true (the default), then rates are normalized according to the relative probability of the initial state among the pair of states (initial, final); this is probably what you want, as these rates will then depend only on the definitions of the states involved (and never the remaining states). Otherwise (``pairwise’’ is false), the rates are normalized according the probability of the initial state among all other states.
- class westpa.cli.tools.w_direct.WESTKineticsBase(parent)
Bases:
WESTSubcommand
Common argument processing for w_direct/w_reweight subcommands. Mostly limited to handling input and output from w_assign.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_direct.AverageCommands(parent)
Bases:
WESTKineticsBase
- default_output_file = 'direct.h5'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- stamp_mcbs_info(dataset)
- open_files()
- open_assignments()
- print_averages(dataset, header, dim=1)
- run_calculation(pi, nstates, start_iter, stop_iter, step_iter, dataset, eval_block, name, dim, do_averages=False, **extra)
- westpa.cli.tools.w_direct.mcbs_ci_correl(estimator_datasets, estimator, alpha, n_sets=None, args=None, autocorrel_alpha=None, autocorrel_n_sets=None, subsample=None, do_correl=True, mcbs_enable=None, estimator_kwargs={})
Perform a Monte Carlo bootstrap estimate for the (1-
alpha
) confidence interval on the givendataset
with the givenestimator
. This routine is appropriate for time-correlated data, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8 to determine a statistically-significant correlation time and then reducing the dataset by a factor of that correlation time before running a “classic” Monte Carlo bootstrap.Returns
(estimate, ci_lb, ci_ub, correl_time)
whereestimate
is the application of the givenestimator
to the inputdataset
,ci_lb
andci_ub
are the lower and upper limits, respectively, of the (1-alpha
) confidence interval onestimate
, andcorrel_time
is the correlation time of the dataset, significant to (1-autocorrel_alpha
).estimator
is called asestimator(dataset, *args, **kwargs)
. Common estimators include:np.mean – calculate the confidence interval on the mean of
dataset
np.median – calculate a confidence interval on the median of
dataset
np.std – calculate a confidence interval on the standard deviation of
datset
.
n_sets
is the number of synthetic data sets to generate using the givenestimator
, which will be chosen using `get_bssize()`_ ifn_sets
is not given.autocorrel_alpha
(which defaults toalpha
) can be used to adjust the significance level of the autocorrelation calculation. Note that too high a significance level (too low an alpha) for evaluating the significance of autocorrelation values can result in a failure to detect correlation if the autocorrelation function is noisy.The given
subsample
function is used, if provided, to subsample the dataset prior to running the full Monte Carlo bootstrap. If none is provided, then a random entry from each correlated block is used as the value for that block. Other reasonable choices includenp.mean
,np.median
,(lambda x: x[0])
or(lambda x: x[-1])
. In particular, usingsubsample=np.mean
will converge to the block averaged mean and standard error, while accounting for any non-normality in the distribution of the mean.
- westpa.cli.tools.w_direct.accumulate_state_populations_from_labeled(labeled_bin_pops, state_map, state_pops, check_state_map=True)
- class westpa.cli.tools.w_direct.DKinetics(parent)
Bases:
WESTKineticsBase
,WKinetics
- subcommand = 'init'
- default_kinetics_file = 'direct.h5'
- default_output_file = 'direct.h5'
- help_text = 'calculate state-to-state kinetics by tracing trajectories'
- description = 'Calculate state-to-state rates and transition event durations by tracing\ntrajectories.\n\nA bin assignment file (usually "assign.h5") including trajectory labeling\nis required (see "w_assign --help" for information on generating this file).\n\nThis subcommand for w_direct is used as input for all other w_direct\nsubcommands, which will convert the flux data in the output file into\naverage rates/fluxes/populations with confidence intervals.\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "direct.h5") contains the\nfollowing datasets:\n\n ``/conditional_fluxes`` [iteration][state][state]\n *(Floating-point)* Macrostate-to-macrostate fluxes. These are **not**\n normalized by the population of the initial macrostate.\n\n ``/conditional_arrivals`` [iteration][stateA][stateB]\n *(Integer)* Number of trajectories arriving at state *stateB* in a given\n iteration, given that they departed from *stateA*.\n\n ``/total_fluxes`` [iteration][state]\n *(Floating-point)* Total flux into a given macrostate.\n\n ``/arrivals`` [iteration][state]\n *(Integer)* Number of trajectories arriving at a given state in a given\n iteration, regardless of where they originated.\n\n ``/duration_count`` [iteration]\n *(Integer)* The number of event durations recorded in each iteration.\n\n ``/durations`` [iteration][event duration]\n *(Structured -- see below)* Event durations for transition events ending\n during a given iteration. These are stored as follows:\n\n istate\n *(Integer)* Initial state of transition event.\n fstate\n *(Integer)* Final state of transition event.\n duration\n *(Floating-point)* Duration of transition, in units of tau.\n weight\n *(Floating-point)* Weight of trajectory at end of transition, **not**\n normalized by initial state population.\n\nBecause state-to-state fluxes stored in this file are not normalized by\ninitial macrostate population, they cannot be used as rates without further\nprocessing. The ``w_direct kinetics`` command is used to perform this normalization\nwhile taking statistical fluctuation and correlation into account. See\n``w_direct kinetics --help`` for more information. Target fluxes (total flux\ninto a given state) require no such normalization.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- open_files()
- go()
- class westpa.cli.tools.w_direct.DKinAvg(parent)
Bases:
AverageCommands
- subcommand = 'kinetics'
- help_text = 'Generates rate and flux values from a WESTPA simulation via tracing.'
- default_kinetics_file = 'direct.h5'
- description = 'Calculate average rates/fluxes and associated errors from weighted ensemble\ndata. Bin assignments (usually "assign.h5") and kinetics data (usually\n"direct.h5") data files must have been previously generated (see\n"w_assign --help" and "w_direct init --help" for information on\ngenerating these files).\n\nThe evolution of all datasets may be calculated, with or without confidence\nintervals.\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, usually "direct.h5") contains the following\ndataset:\n\n /avg_rates [state,state]\n (Structured -- see below) State-to-state rates based on entire window of\n iterations selected.\n\n /avg_total_fluxes [state]\n (Structured -- see below) Total fluxes into each state based on entire\n window of iterations selected.\n\n /avg_conditional_fluxes [state,state]\n (Structured -- see below) State-to-state fluxes based on entire window of\n iterations selected.\n\nIf --evolution-mode is specified, then the following additional datasets are\navailable:\n\n /rate_evolution [window][state][state]\n (Structured -- see below). State-to-state rates based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\n /target_flux_evolution [window,state]\n (Structured -- see below). Total flux into a given macro state based on\n windows of iterations of varying width, as in /rate_evolution.\n\n /conditional_flux_evolution [window,state,state]\n (Structured -- see below). State-to-state fluxes based on windows of\n varying width, as in /rate_evolution.\n\nThe structure of these datasets is as follows:\n\n iter_start\n (Integer) Iteration at which the averaging window begins (inclusive).\n\n iter_stop\n (Integer) Iteration at which the averaging window ends (exclusive).\n\n expected\n (Floating-point) Expected (mean) value of the observable as evaluated within\n this window, in units of inverse tau.\n\n ci_lbound\n (Floating-point) Lower bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n ci_ubound\n (Floating-point) Upper bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n stderr\n (Floating-point) The standard error of the mean of the observable\n within this window, in units of inverse tau.\n\n corr_len\n (Integer) Correlation length of the observable within this window, in units\n of tau.\n\nEach of these datasets is also stamped with a number of attributes:\n\n mcbs_alpha\n (Floating-point) Alpha value of confidence intervals. (For example,\n *alpha=0.05* corresponds to a 95% confidence interval.)\n\n mcbs_nsets\n (Integer) Number of bootstrap data sets used in generating confidence\n intervals.\n\n mcbs_acalpha\n (Floating-point) Alpha value for determining correlation lengths.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- go()
- class westpa.cli.tools.w_direct.DStateProbs(parent)
Bases:
AverageCommands
- subcommand = 'probs'
- help_text = 'Calculates color and state probabilities via tracing.'
- default_kinetics_file = 'direct.h5'
- description = 'Calculate average populations and associated errors in state populations from\nweighted ensemble data. Bin assignments, including macrostate definitions,\nare required. (See "w_assign --help" for more information).\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, usually "direct.h5") contains the following\ndataset:\n\n /avg_state_probs [state]\n (Structured -- see below) Population of each state across entire\n range specified.\n\n /avg_color_probs [state]\n (Structured -- see below) Population of each ensemble across entire\n range specified.\n\nIf --evolution-mode is specified, then the following additional datasets are\navailable:\n\n /state_pop_evolution [window][state]\n (Structured -- see below). State populations based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\n /color_prob_evolution [window][state]\n (Structured -- see below). Ensemble populations based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\nThe structure of these datasets is as follows:\n\n iter_start\n (Integer) Iteration at which the averaging window begins (inclusive).\n\n iter_stop\n (Integer) Iteration at which the averaging window ends (exclusive).\n\n expected\n (Floating-point) Expected (mean) value of the observable as evaluated within\n this window, in units of inverse tau.\n\n ci_lbound\n (Floating-point) Lower bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n ci_ubound\n (Floating-point) Upper bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n stderr\n (Floating-point) The standard error of the mean of the observable\n within this window, in units of inverse tau.\n\n corr_len\n (Integer) Correlation length of the observable within this window, in units\n of tau.\n\nEach of these datasets is also stamped with a number of attributes:\n\n mcbs_alpha\n (Floating-point) Alpha value of confidence intervals. (For example,\n *alpha=0.05* corresponds to a 95% confidence interval.)\n\n mcbs_nsets\n (Integer) Number of bootstrap data sets used in generating confidence\n intervals.\n\n mcbs_acalpha\n (Floating-point) Alpha value for determining correlation lengths.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- calculate_state_populations(pops)
- w_stateprobs()
- go()
- class westpa.cli.tools.w_direct.DAll(parent)
Bases:
DStateProbs
,DKinAvg
,DKinetics
- subcommand = 'all'
- help_text = 'Runs the full suite, including the tracing of events.'
- default_kinetics_file = 'direct.h5'
- description = 'A convenience function to run init/kinetics/probs. Bin assignments,\nincluding macrostate definitions, are required. (See\n"w_assign --help" for more information).\n\nFor more information on the individual subcommands this subs in for, run\nw_direct {init/kinetics/probs} --help.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- go()
- class westpa.cli.tools.w_direct.DAverage(parent)
Bases:
DStateProbs
,DKinAvg
- subcommand = 'average'
- help_text = 'Averages and returns fluxes, rates, and color/state populations.'
- default_kinetics_file = 'direct.h5'
- description = 'A convenience function to run kinetics/probs. Bin assignments,\nincluding macrostate definitions, are required. (See\n"w_assign --help" for more information).\n\nFor more information on the individual subcommands this subs in for, run\nw_direct {kinetics/probs} --help.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- go()
- class westpa.cli.tools.w_direct.WDirect
Bases:
WESTMasterCommand
,WESTParallelTool
- prog = 'w_direct'
- subcommands = [<class 'westpa.cli.tools.w_direct.DKinetics'>, <class 'westpa.cli.tools.w_direct.DAverage'>, <class 'westpa.cli.tools.w_direct.DKinAvg'>, <class 'westpa.cli.tools.w_direct.DStateProbs'>, <class 'westpa.cli.tools.w_direct.DAll'>]
- subparsers_title = 'direct kinetics analysis schemes'
- westpa.cli.tools.w_direct.entry_point()
w_select
usage:
w_select [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-W WEST_H5FILE] [--first-iter N_ITER]
[--last-iter N_ITER] [-p MODULE.FUNCTION] [-v] [-a] [-o OUTPUT]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Select dynamics segments matching various criteria. This requires a user-provided prediate function. By default, only matching segments are stored. If the -a/–include-ancestors option is given, then matching segments and their ancestors will be stored.
Predicate function
Segments are selected based on a predicate function, which must be callable
as predicate(n_iter, iter_group)
and return a collection of segment IDs
matching the predicate in that iteration.
The predicate may be inverted by specifying the -v/–invert command-line argument.
Output format
The output file (-o/–output, by default “select.h5”) contains the following datasets:
``/n_iter`` [iteration]
*(Integer)* Iteration numbers for each entry in other datasets.
``/n_segs`` [iteration]
*(Integer)* Number of segment IDs matching the predicate (or inverted
predicate, if -v/--invert is specified) in the given iteration.
``/seg_ids`` [iteration][segment]
*(Integer)* Matching segments in each iteration. For an iteration
``n_iter``, only the first ``n_iter`` entries are valid. For example,
the full list of matching seg_ids in the first stored iteration is
``seg_ids[0][:n_segs[0]]``.
``/weights`` [iteration][segment]
*(Floating-point)* Weights for each matching segment in ``/seg_ids``.
Command-line arguments
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that
have very large requests/response. Default: no limit.
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
selection options:
-p MODULE.FUNCTION, --predicate-function MODULE.FUNCTION
Use the given predicate function to match segments. This function should take an
iteration number and the HDF5 group corresponding to that iteration and return a
sequence of seg_ids matching the predicate, as in ``match_predicate(n_iter,
iter_group)``.
-v, --invert Invert the match predicate.
-a, --include-ancestors
Include ancestors of matched segments in output.
- output options:
- -o OUTPUT, --output OUTPUT
Write output to OUTPUT (default: select.h5).
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
westpa.cli.tools.w_select module
- class westpa.cli.tools.w_select.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_select.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_select.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- class westpa.cli.tools.w_select.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- westpa.cli.tools.w_select.seg_id_dtype
alias of
int64
- westpa.cli.tools.w_select.n_iter_dtype
alias of
uint32
- westpa.cli.tools.w_select.weight_dtype
alias of
float64
- westpa.cli.tools.w_select.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- class westpa.cli.tools.w_select.WSelectTool
Bases:
WESTParallelTool
- prog = 'w_select'
- description = 'Select dynamics segments matching various criteria. This requires a\nuser-provided prediate function. By default, only matching segments are\nstored. If the -a/--include-ancestors option is given, then matching segments\nand their ancestors will be stored.\n\n\n-----------------------------------------------------------------------------\nPredicate function\n-----------------------------------------------------------------------------\n\nSegments are selected based on a predicate function, which must be callable\nas ``predicate(n_iter, iter_group)`` and return a collection of segment IDs\nmatching the predicate in that iteration.\n\nThe predicate may be inverted by specifying the -v/--invert command-line\nargument.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "select.h5") contains the following\ndatasets:\n\n ``/n_iter`` [iteration]\n *(Integer)* Iteration numbers for each entry in other datasets.\n\n ``/n_segs`` [iteration]\n *(Integer)* Number of segment IDs matching the predicate (or inverted\n predicate, if -v/--invert is specified) in the given iteration.\n\n ``/seg_ids`` [iteration][segment]\n *(Integer)* Matching segments in each iteration. For an iteration\n ``n_iter``, only the first ``n_iter`` entries are valid. For example,\n the full list of matching seg_ids in the first stored iteration is\n ``seg_ids[0][:n_segs[0]]``.\n\n ``/weights`` [iteration][segment]\n *(Floating-point)* Weights for each matching segment in ``/seg_ids``.\n\n\n-----------------------------------------------------------------------------\nCommand-line arguments\n-----------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_select.entry_point()
w_states
usage:
w_states [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--show | --append | --replace] [--bstate-file BSTATE_FILE] [--bstate BSTATES]
[--tstate-file TSTATE_FILE] [--tstate TSTATES]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Display or manipulate basis (initial) or target (recycling) states for a WEST simulation. By default,
states are displayed (or dumped to files). If --replace
is specified, all basis/target states are
replaced for the next iteration. If --append
is specified, the given target state(s) are appended to
the list for the next iteration. Appending basis states is not permitted, as this would require
renormalizing basis state probabilities in ways that may be error-prone. Instead, use w_states --show
--bstate-file=bstates.txt
and then edit the resulting bstates.txt
file to include the new desired
basis states, then use w_states --replace --bstate-file=bstates.txt
to update the WEST HDF5 file
appropriately.
optional arguments:
-h, --help show this help message and exit
--bstate-file BSTATE_FILE
Read (--append/--replace) or write (--show) basis state names, probabilities, and
data references from/to BSTATE_FILE.
--bstate BSTATES Add the given basis state (specified as a string 'label,probability[,auxref]') to
the list of basis states (after those specified in --bstate-file, if any). This
argument may be specified more than once, in which case the given states are
appended in the order they are given on the command line.
--tstate-file TSTATE_FILE
Read (--append/--replace) or write (--show) target state names and representative
progress coordinates from/to TSTATE_FILE
--tstate TSTATES Add the given target state (specified as a string 'label,pcoord0[,pcoord1[,...]]')
to the list of target states (after those specified in the file given by
--tstates-from, if any). This argument may be specified more than once, in which
case the given states are appended in the order they appear on the command line.
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
modes of operation:
--show Display current basis/target states (or dump to files).
--append Append the given basis/target states to those currently in use.
--replace Replace current basis/target states with those specified.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
westpa.cli.core.w_states module
- westpa.cli.core.w_states.make_work_manager()
Using cues from the environment, instantiate a pre-configured work manager.
- class westpa.cli.core.w_states.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.cli.core.w_states.BasisState(label, probability, pcoord=None, auxref=None, state_id=None)
Bases:
object
Describes an basis (micro)state. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation (i.e. at w_init) or due to recycling.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
probability – Probability of this state to be selected when creating a new trajectory.
pcoord – The representative progress coordinate of this state.
auxref – A user-provided (string) reference for locating data associated with this state (usually a filesystem path).
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile)
Read a file defining basis states. Each line defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in:
unbound 1.0
or:
unbound_0 0.6 state0.pdb unbound_1 0.4 state1.pdb
- as_numpy_record()
Return the data for this state as a numpy record array.
- class westpa.cli.core.w_states.TargetState(label, pcoord, state_id=None)
Bases:
object
Describes a target state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
pcoord – The representative progress coordinate of this state.
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile, dtype)
Read a file defining target states. Each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in:
bound 0.02
for a single target and one-dimensional progress coordinates or:
bound 2.7 0.0 drift 100 50.0
for two targets and a two-dimensional progress coordinate.
- westpa.cli.core.w_states.entry_point()
- westpa.cli.core.w_states.initialize(mode, bstates, _bstate_file, tstates, _tstate_file)
w_eddist
usage:
w_eddist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-b BINEXPR] [-C] [--loose] --istate ISTATE
--fstate FSTATE [--first-iter ITER_START] [--last-iter ITER_STOP] [-k KINETICS]
[-o OUTPUT] [--serial | --parallel | --work-manager WORK_MANAGER]
[--n-workers N_WORKERS] [--zmq-mode MODE] [--zmq-comm-mode COMM_MODE]
[--zmq-write-host-info INFO_FILE] [--zmq-read-host-info INFO_FILE]
[--zmq-upstream-rr-endpoint ENDPOINT] [--zmq-upstream-ann-endpoint ENDPOINT]
[--zmq-downstream-rr-endpoint ENDPOINT] [--zmq-downstream-ann-endpoint ENDPOINT]
[--zmq-master-heartbeat MASTER_HEARTBEAT] [--zmq-worker-heartbeat WORKER_HEARTBEAT]
[--zmq-timeout-factor FACTOR] [--zmq-startup-timeout STARTUP_TIMEOUT]
[--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Calculate time-resolved transition-event duration distribution from kinetics results
Source data
Source data is collected from the results of ‘w_kinetics trace’ (see w_kinetics trace –help for more information on generating this dataset).
Histogram binning
By default, histograms are constructed with 100 bins in each dimension. This can be overridden by specifying -b/–bins, which accepts a number of different kinds of arguments:
a single integer N
N uniformly spaced bins will be used in each dimension.
a sequence of integers N1,N2,... (comma-separated)
N1 uniformly spaced bins will be used for the first dimension, N2 for the
second, and so on.
a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]
The bin boundaries B11, B12, B13, ... will be used for the first dimension,
B21, B22, B23, ... for the second dimension, and so on. These bin
boundaries need not be uniformly spaced. These expressions will be
evaluated with Python's ``eval`` construct, with ``np`` available for
use [e.g. to specify bins using np.arange()].
The first two forms (integer, list of integers) will trigger a scan of all data in each dimension in order to determine the minimum and maximum values, which may be very expensive for large datasets. This can be avoided by explicitly providing bin boundaries using the list-of-lists form.
Note that these bins are NOT at all related to the bins used to drive WE sampling.
Output format
The output file produced (specified by -o/–output, defaulting to “pdist.h5”) may be fed to plothist to generate plots (or appropriately processed text or HDF5 files) from this data. In short, the following datasets are created:
``histograms``
Normalized histograms. The first axis corresponds to iteration, and
remaining axes correspond to dimensions of the input dataset.
``/binbounds_0``
Vector of bin boundaries for the first (index 0) dimension. Additional
datasets similarly named (/binbounds_1, /binbounds_2, ...) are created
for additional dimensions.
``/midpoints_0``
Vector of bin midpoints for the first (index 0) dimension. Additional
datasets similarly named are created for additional dimensions.
``n_iter``
Vector of iteration numbers corresponding to the stored histograms (i.e.
the first axis of the ``histograms`` dataset).
Subsequent processing
The output generated by this program (-o/–output, default “pdist.h5”) may be
plotted by the plothist
program. See plothist --help
for more
information.
Parallelization
This tool supports parallelized binning, including reading of input data. Parallel processing is the default. For simple cases (reading pre-computed input data, modest numbers of segments), serial processing (–serial) may be more efficient.
Command-line options
optional arguments:
-h, --help show this help message and exit
-b BINEXPR, --bins BINEXPR
Use BINEXPR for bins. This may be an integer, which will be used for each
dimension of the progress coordinate; a list of integers (formatted as
[n1,n2,...]) which will use n1 bins for the first dimension, n2 for the second
dimension, and so on; or a list of lists of boundaries (formatted as [[a1, a2,
...], [b1, b2, ...], ... ]), which will use [a1, a2, ...] as bin boundaries for
the first dimension, [b1, b2, ...] as bin boundaries for the second dimension,
and so on. (Default: 100 bins in each dimension.)
-C, --compress Compress histograms. May make storage of higher-dimensional histograms more
tractable, at the (possible extreme) expense of increased analysis time.
(Default: no compression.)
--loose Ignore values that do not fall within bins. (Risky, as this can make buggy bin
boundaries appear as reasonable data. Only use if you are sure of your bin
boundary specification.)
--istate ISTATE Initial state defining transition event
--fstate FSTATE Final state defining transition event
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
- parallelization options:
- --max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that have very large requests/response. Default: no limit.
iteration range options:
--first-iter ITER_START
Iteration to begin analysis (default: 1)
--last-iter ITER_STOP
Iteration to end analysis
input/output options:
-k KINETICS, --kinetics KINETICS
Populations and transition rates (including evolution) are stored in KINETICS
(default: kintrace.h5).
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: eddist.h5).
- parallelization options:
- --serial
run in serial mode
- --parallel
run in parallel mode (using processes)
- --work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers are (‘serial’, ‘threads’, ‘processes’, ‘zmq’); default is ‘processes’
- --n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use 0 for a dedicated server. (Ignored by work managers which do not support this option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a
deprecated synonym for "master" and "client" is a deprecated synonym for
"node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g.
/tmp); on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read
this file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting
in coordinating the communication of other nodes to choose ports randomly,
writing that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic
toward the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result)
traffic from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker
in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
westpa.cli.tools.w_eddist module
- class westpa.cli.tools.w_eddist.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_eddist.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- westpa.cli.tools.w_eddist.histnd(values, binbounds, weights=1.0, out=None, binbound_check=True, ignore_out_of_range=False)
Generate an N-dimensional PDF (or contribution to a PDF) from the given values.
binbounds
is a list of arrays of boundary values, with one entry for each dimension (values
must have as many columns as there are entries inbinbounds
)weight
, if provided, specifies the weight each value contributes to the histogram; this may be a scalar (for equal weights for all values) or a vector of the same length asvalues
(for unequal weights). Ifbinbound_check
is True, then the boundaries are checked for strict positive monotonicity; set to False to shave a few microseconds if you know your bin boundaries to be monotonically increasing.
- westpa.cli.tools.w_eddist.normhistnd(hist, binbounds)
Normalize the N-dimensional histogram
hist
with corresponding bin boundariesbinbounds
. Modifieshist
in place and returns the normalization factor used.
- class westpa.cli.tools.w_eddist.DurationDataset(dataset, mask, iter_start=1)
Bases:
object
A facade for the ‘dsspec’ dataclass that incorporates the mask into get_iter_data method
- get_iter_data(n_iter)
- westpa.cli.tools.w_eddist.isiterable(x)
- class westpa.cli.tools.w_eddist.WEDDist
Bases:
WESTParallelTool
- prog = 'w_eddist'
- description = 'Calculate time-resolved transition-event duration distribution from kinetics results\n\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is collected from the results of \'w_kinetics trace\' (see w_kinetics trace --help for\nmore information on generating this dataset).\n\n\n-----------------------------------------------------------------------------\nHistogram binning\n-----------------------------------------------------------------------------\n\nBy default, histograms are constructed with 100 bins in each dimension. This\ncan be overridden by specifying -b/--bins, which accepts a number of different\nkinds of arguments:\n\n a single integer N\n N uniformly spaced bins will be used in each dimension.\n\n a sequence of integers N1,N2,... (comma-separated)\n N1 uniformly spaced bins will be used for the first dimension, N2 for the\n second, and so on.\n\n a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]\n The bin boundaries B11, B12, B13, ... will be used for the first dimension,\n B21, B22, B23, ... for the second dimension, and so on. These bin\n boundaries need not be uniformly spaced. These expressions will be\n evaluated with Python\'s ``eval`` construct, with ``np`` available for\n use [e.g. to specify bins using np.arange()].\n\nThe first two forms (integer, list of integers) will trigger a scan of all\ndata in each dimension in order to determine the minimum and maximum values,\nwhich may be very expensive for large datasets. This can be avoided by\nexplicitly providing bin boundaries using the list-of-lists form.\n\nNote that these bins are *NOT* at all related to the bins used to drive WE\nsampling.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file produced (specified by -o/--output, defaulting to "pdist.h5")\nmay be fed to plothist to generate plots (or appropriately processed text or\nHDF5 files) from this data. In short, the following datasets are created:\n\n ``histograms``\n Normalized histograms. The first axis corresponds to iteration, and\n remaining axes correspond to dimensions of the input dataset.\n\n ``/binbounds_0``\n Vector of bin boundaries for the first (index 0) dimension. Additional\n datasets similarly named (/binbounds_1, /binbounds_2, ...) are created\n for additional dimensions.\n\n ``/midpoints_0``\n Vector of bin midpoints for the first (index 0) dimension. Additional\n datasets similarly named are created for additional dimensions.\n\n ``n_iter``\n Vector of iteration numbers corresponding to the stored histograms (i.e.\n the first axis of the ``histograms`` dataset).\n\n\n-----------------------------------------------------------------------------\nSubsequent processing\n-----------------------------------------------------------------------------\n\nThe output generated by this program (-o/--output, default "pdist.h5") may be\nplotted by the ``plothist`` program. See ``plothist --help`` for more\ninformation.\n\n\n-----------------------------------------------------------------------------\nParallelization\n-----------------------------------------------------------------------------\n\nThis tool supports parallelized binning, including reading of input data.\nParallel processing is the default. For simple cases (reading pre-computed\ninput data, modest numbers of segments), serial processing (--serial) may be\nmore efficient.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- static parse_binspec(binspec)
- construct_bins(bins)
Construct bins according to
bins
, which may be:A scalar integer (for that number of bins in each dimension)
A sequence of integers (specifying number of bins for each dimension)
A sequence of sequences of bin boundaries (specifying boundaries for each dimension)
Sets
self.binbounds
to a list of arrays of bin boundaries appropriate for passing to fasthist.histnd, along withself.midpoints
to the midpoints of the bins.
- scan_data_shape()
- scan_data_range()
Scan input data for range in each dimension. The number of dimensions is determined from the shape of the progress coordinate as of self.iter_start.
- construct_histogram()
Construct a histogram using bins previously constructed with
construct_bins()
. The time series of histogram values is stored inhistograms
. Each histogram in the time series is normalized.
- westpa.cli.tools.w_eddist.entry_point()
w_ntop
usage:
w_ntop [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-W WEST_H5FILE]
[--first-iter N_ITER] [--last-iter N_ITER] [-a ASSIGNMENTS] [-n COUNT] [-t TIMEPOINT]
[--highweight | --lowweight | --random] [-o OUTPUT]
Select walkers from bins . An assignment file mapping walkers to
bins at each timepoint is required (see``w_assign –help`` for further
information on generating this file). By default, high-weight walkers are
selected (hence the name w_ntop
: select the N top-weighted walkers from
each bin); however, minimum weight walkers and randomly-selected walkers
may be selected instead.
Output format
The output file (-o/–output, by default “ntop.h5”) contains the following datasets:
``/n_iter`` [iteration]
*(Integer)* Iteration numbers for each entry in other datasets.
``/n_segs`` [iteration][bin]
*(Integer)* Number of segments in each bin/state in the given iteration.
This will generally be the same as the number requested with
``--n/--count`` but may be smaller if the requested number of walkers
does not exist.
``/seg_ids`` [iteration][bin][segment]
*(Integer)* Matching segments in each iteration for each bin.
For an iteration ``n_iter``, only the first ``n_iter`` entries are
valid. For example, the full list of matching seg_ids in bin 0 in the
first stored iteration is ``seg_ids[0][0][:n_segs[0]]``.
``/weights`` [iteration][bin][segment]
*(Floating-point)* Weights for each matching segment in ``/seg_ids``.
Command-line arguments
optional arguments:
-h, --help show this help message and exit
--highweight Select COUNT highest-weight walkers from each bin.
--lowweight Select COUNT lowest-weight walkers from each bin.
--random Select COUNT walkers randomly from each bin.
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
input options:
-a ASSIGNMENTS, --assignments ASSIGNMENTS
Use assignments from the given ASSIGNMENTS file (default: assign.h5).
selection options:
-n COUNT, --count COUNT
Select COUNT walkers from each iteration for each bin (default: 1).
-t TIMEPOINT, --timepoint TIMEPOINT
Base selection on the given TIMEPOINT within each iteration. Default (-1)
corresponds to the last timepoint.
output options:
-o OUTPUT, --output OUTPUT
Write output to OUTPUT (default: ntop.h5).
westpa.cli.tools.w_ntop module
- class westpa.cli.tools.w_ntop.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- class westpa.cli.tools.w_ntop.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_ntop.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- class westpa.cli.tools.w_ntop.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- westpa.cli.tools.w_ntop.seg_id_dtype
alias of
int64
- westpa.cli.tools.w_ntop.n_iter_dtype
alias of
uint32
- westpa.cli.tools.w_ntop.weight_dtype
alias of
float64
- westpa.cli.tools.w_ntop.assignments_list_to_table(nsegs, nbins, assignments)
Convert a list of bin assignments (integers) to a boolean table indicating indicating if a given segment is in a given bin
- class westpa.cli.tools.w_ntop.WNTopTool
Bases:
WESTTool
- prog = 'w_ntop'
- description = 'Select walkers from bins . An assignment file mapping walkers to\nbins at each timepoint is required (see``w_assign --help`` for further\ninformation on generating this file). By default, high-weight walkers are\nselected (hence the name ``w_ntop``: select the N top-weighted walkers from\neach bin); however, minimum weight walkers and randomly-selected walkers\nmay be selected instead.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "ntop.h5") contains the following\ndatasets:\n\n ``/n_iter`` [iteration]\n *(Integer)* Iteration numbers for each entry in other datasets.\n\n ``/n_segs`` [iteration][bin]\n *(Integer)* Number of segments in each bin/state in the given iteration.\n This will generally be the same as the number requested with\n ``--n/--count`` but may be smaller if the requested number of walkers\n does not exist.\n\n ``/seg_ids`` [iteration][bin][segment]\n *(Integer)* Matching segments in each iteration for each bin.\n For an iteration ``n_iter``, only the first ``n_iter`` entries are\n valid. For example, the full list of matching seg_ids in bin 0 in the\n first stored iteration is ``seg_ids[0][0][:n_segs[0]]``.\n\n ``/weights`` [iteration][bin][segment]\n *(Floating-point)* Weights for each matching segment in ``/seg_ids``.\n\n\n-----------------------------------------------------------------------------\nCommand-line arguments\n-----------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_ntop.entry_point()
w_multi_west
The w_multi_west
tool combines multiple WESTPA simulations into a single aggregate simulation to facilitate the analysis of the set of simulations. In particular, the tool creates a single west.h5
file that contains all of the data from the west.h5 files of the individual simulations. Each iteration x in the new file contains all of the segments from iteration x from each of the set of simulation, all normalized to the total weight.
Overview
usage:
w_multi_west [-h] [-m master] [-n sims] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE] [-a aux] [--auxall] [--ibstates]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
optional arguments:
-h, --help show this help message and exit
- General options::
- -m, --master directory
Master path of simulations where all the smaller simulations are stored (default: Current Directory)
- -n, --sims n
Number of simulation directories. Assumes leading zeros. (default: 0)
- --quiet
emit only essential information
- --verbose
emit extra information
- --version
show program’s version number and exit
Command-Line Options
See the general command-line tool reference for more information on the general options.
Input/output options
These arguments allow the user to specify where to read input simulation result data and where to output calculated progress coordinate probability distribution data.
Both input and output files are hdf5 format:
-W, --west, --WEST_H5FILE file
The name of the main .h5 file inside each simulation directory. (Default: west.h5)
-o, --output file
Store this tool's output in file. (Default: multi.h5)
-a, --aux auxdata
Name of additional auxiliary dataset to be combined. Can be called multiple times.
(Default: None)
-aa, --auxall
Combine all auxiliary datsets as labeled in ``west.h5`` in folder 01. (Default: False)
-nr, --no-reweight
Do not perform reweighting. (Default: False)
-ib, --ibstates
Attempt to combine ``ibstates`` dataset if the basis states are identical across
all simulations. Needed when tracing with ``westpa.analysis``. (Default: False)
Examples
If you have five simulations, set up your directory such that you
have five directories are named numerically with leading zeroes, and each
directory contains a west.h5
file. For this example, each west.h5
also contains an auxiliary dataset called RMSD
. If you run ls
, you
will see the following output:
01 02 03 04 05
To run the w_multi_west tool, do the following:
w_multi_west.py -m . -n 5 --aux=RMSD
If you used any custom WESTSystem, include that in the directory where you run the code.
To proceed in analyzing the aggregated simulation data as a single
simulation, rename the output file multi.h5
to west.h5
.
westpa.cli.tools.w_multi_west module
- class westpa.cli.tools.w_multi_west.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- westpa.cli.tools.w_multi_west.n_iter_dtype
alias of
uint32
- class westpa.cli.tools.w_multi_west.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_multi_west.WESTMultiTool(wm_env=None)
Bases:
WESTParallelTool
Base class for command-line tools which work with multiple simulations. Automatically parses for and gives commands to load multiple files.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- parse_from_yaml(yamlfilepath)
Parse options from YAML input file. Command line arguments take precedence over options specified in the YAML hierarchy. TODO: add description on how YAML files should be constructed.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- exception NoSimulationsException
Bases:
Exception
- generate_file_list(key_list)
A convenience function which takes in a list of keys that are filenames, and returns a dictionary which contains all the individual files loaded inside of a dictionary keyed to the filename.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- westpa.cli.tools.w_multi_west.get_bin_mapper(we_h5file, hashval)
Look up the given hash value in the binning table, unpickling and returning the corresponding bin mapper if available, or raising KeyError if not.
- westpa.cli.tools.w_multi_west.create_idtype_array(input_array)
Return a new array with the new istate_dtype while preserving old data.
- class westpa.cli.tools.w_multi_west.WMultiWest
Bases:
WESTMultiTool
- prog = 'w_multi_west'
- description = 'Tool designed to combine multiple WESTPA simulations while accounting for\nreweighting.\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- open_files()
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- total_number_of_walkers()
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_multi_west.entry_point()
w_red
usage:
w_red [-h] [-r RCFILE] [--quiet] [--verbose] [--version] [--max-queue-length MAX_QUEUE_LENGTH]
[--debug] [--terminal]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
optional arguments:
-h, --help show this help message and exit
- general options:
- -r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
- --quiet
emit only essential information
- --verbose
emit extra information
- --version
show program’s version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that
have very large requests/response. Default: no limit.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
westpa.cli.tools.w_red module
- westpa.cli.tools.w_red.H5File
alias of
File
- class westpa.cli.tools.w_red.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_red.DurationCorrector(durations, weights, dtau, maxduration=None)
Bases:
object
- static from_kinetics_file(directh5, istate, fstate, dtau, n_iters=None)
- property event_duration_histogram
- property cumulative_event_duration_histogram
- westpa.cli.tools.w_red.get_raw_rates(directh5, istate, fstate, n_iters=None)
- westpa.cli.tools.w_red.calc_avg_rate(directh5_path, istate, fstate, **kwargs)
Return the raw or RED-corrected rate constant with the confidence interval.
- Parameters:
nstiter (duration of each iteration (number of steps))
ntpr (report inteval (number of steps))
- westpa.cli.tools.w_red.calc_rates(directh5_path, istate, fstate, **kwargs)
Return the raw and RED-corrected rate constants vs. iterations. This code is faster than calling calc_rate() iteratively
- Parameters:
nstiter (duration of each iteration (number of steps))
ntpr (report inteval (number of steps))
- class westpa.cli.tools.w_red.RateCalculator(directh5, istate, fstate, assignh5=None, **kwargs)
Bases:
object
- property conditional_fluxes
- property populations
- property tau
- property dtau
- property istate
- property fstate
- property n_iters
- calc_rate(i_iter=None, red=False, **kwargs)
- calc_rates(n_iters=None, **kwargs)
- class westpa.cli.tools.w_red.WRed
Bases:
WESTParallelTool
- prog = 'w_red'
- description = 'Apply the RED scheme to estimate steady-state WE fluxes from\nshorter trajectories.\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is provided as a w_ipa "scheme" which is typically defined\nin the west.cfg file. For instance, if a user wishes to estimate RED\nfluxes for a scheme named "DEFAULT" that argument would be provided\nto w_red and WRed would estimate RED fluxes based off of the data\ncontained in the assign.h5 and direct.h5 files in ANALYSIS/DEFAULT.\n\n'
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_red.entry_point()
plothist
Use the plothist
tool to plot the results of w_pdist. This tool uses
an hdf5 file as its input (i.e. the output of another analysis tool), and
outputs a pdf image.
The plothist
tool operates in one of three (mutually exclusive)
plotting modes:
evolution
: Plots the relevant data as a time evolution over specified number of simulation iterationsaverage
: Plots the relevant data as a time average over a specified number of iterationsinstant
: Plots the relevant data for a single specified iteration
Overview
The basic usage, independent of plotting mode, is as follows:
usage:
| ``plothist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]``
| `` {instant,average,evolution} input ...``
Note that the user must specify a plotting mode (i.e. ‘instant
‘,
‘average
‘, or ‘evolution
‘) and an input file, input
.
Therefore, this tool is always called as:
plothist mode input_file [
``other``
``options``]
‘instant
‘ mode
usage:
| ``plothist instant [-h] input [-o PLOT_OUTPUT]``
| `` [--hdf5-output HDF5_OUTPUT] [--text-output TEXT_OUTPUT]``
| `` [--title TITLE] [--range RANGE] [--linear | --energy | --log10]``
| `` [--iter N_ITER] ``
| `` [DIMENSION] [ADDTLDIM]``
‘average
‘ mode
usage:
| ``plothist average [-h] input [-o PLOT_OUTPUT]``
| `` [--hdf5-output HDF5_OUTPUT] [--text-output TEXT_OUTPUT]``
| `` [--title TITLE] [--range RANGE] [--linear | --energy | --log10]``
| `` [--first-iter N_ITER] [--last-iter N_ITER] ``
| `` [DIMENSION] [ADDTLDIM]``
‘evolution
‘ mode
usage:
| ``plothist evolution [-h] input [-o PLOT_OUTPUT]``
| `` [--hdf5-output HDF5_OUTPUT]``
| `` [--title TITLE] [--range RANGE] [--linear | --energy | --log10]``
| `` [--first-iter N_ITER] [--last-iter N_ITER]``
| `` [--step-iter STEP] ``
| `` [DIMENSION]``
Command-Line Options
See the command-line tool index for more information on the general options.
Unless specified (as a Note in the command-line option description), the command-line options below are shared for all three plotting modes
Input/output options
No matter the mode, an input hdf5 file must be specified. There are three possible outputs that are mode or user-specified: A text file, an hdf5 file, and a pdf image.
Specifying input file
- *``input``*
Specify the input hdf5 file ‘’
input
. This is the output file from a previous analysis tool (e.g. ‘pdist.h5’)
Output plot pdf file
- ``-o ‘’plot_output’’, –plot_output ‘’plot_output’’``
Specify the name of the pdf plot image output (Default: ‘hist.pdf’). Note: You can suppress plotting entirely by specifying an empty string as plot_output (i.e.
-o ''
or--plot_output ''
)
Additional output options
Note: plothist
provides additional, optional arguments to output the
data points used to construct the plot:
- ``–hdf5-output ‘’hdf5_output’’``
Output plot data hdf5 file
'hdf5_output'
(Default: No hdf5 output file)- ``–text-output ‘’text_output’’``
Output plot data as a text file named
'text_output'
(Default: No text output file) Note: This option is only available for 1 dimensional histogram plots (that is,'average'
and'instant'
modes only)
Plotting options
The following options allow the user to specify a plot title, the type of plot (i.e. energy or probability distribution), whether to apply a log transformation to the data, and the range of data values to include.
- ``–title ‘’title’’ ``
Optionally specify a title, ``title``, for the plot (Default: No title)
- ``–range ‘’<nowiki>’</nowiki>LB, UB<nowiki>’</nowiki>’’``
Optionally specify the data range to be plotted as “
LB, UB
” (e.g.' --range "-1, 10" '
- note that the quotation marks are necessary if specifying a negative bound). For 1 dimensional histograms, the range affects the y axis. For 2 dimensional plots (e.g. evolution plot with 1 dimensional progress coordinate), it corresponds to the range of the color bar
Mutually exclusive plotting options
The following three options determine how the plotted data is
represented (Default: '--energy'
)
- ``–energy ``
Plots the probability distribution on an inverted natural log scale (i.e. -ln[P(x)] ), corresponding to the free energy (Default)
- ``–linear ``
Plots the probability distribution function as a linear scale
- ``–log10 ``
Plots the (base-10) logarithm of the probability distribution
Iteration selection options
Depending on plotting mode, you can select either a range or a single iteration to plot.
``’instant’`` mode only:
- ``–iter ‘’n_iter’’ ``
Plot the distribution for iteration
''n_iter''
(Default: Last completed iteration)
``’average’`` and ``’evolution’`` modes only:
- ``–first-iter ‘’first_iter’’ ``
Begin averaging or plotting at iteration ``first_iter`` (Default: 1)
- ``–last-iter ‘’last_iter’’ ``
Average or plot up to and including ``last_iter`` (Default: Last completed iteration)
``’evolution’`` mode only:
- ``–iter_step ‘’n_step’’ ``
Average every ``n_step`` iterations together when plotting in
'evolution'
mode (Default: 1 - i.e. plot each iteration)
Specifying progress coordinate dimension
For progress coordinates with dimensions greater than 1, you can specify the dimension of the progress coordinate to use, the of progress coordinate values to include, and the progress coordinate axis label with a single positional argument:
- ``dimension ``
Specify
'dimension'
as ‘int[:[LB,UB]:label]
‘, where ‘int
‘ specifies the dimension (starting at 0), and, optionally, ‘LB,UB
‘ specifies the lower and upper range bounds, and/or ‘label
‘ specifies the axis label (Default:int
= 0, full range, default label is ‘dimensionint
’; e.g ‘dimension 0’)
For 'average'
and 'instant'
modes, you can plot two dimensions
at once using a color map if this positional argument is specified:
- ``addtl_dimension ``
Specify the other dimension to include as
'addtl_dimension'
Examples
These examples assume the input file is created using w_pdist and is named ‘pdist.h5’
Basic plotting
Plot the energy ( -ln(P(x)) ) for the last iteration
plothist instant pdist.h5
Plot the evolution of the log10 of the probability distribution over all iterations
``plothist evolution pdist.h5 –log10 ``
Plot the average linear probability distribution over all iterations
plothist average pdist.h5 --linear
Specifying progress coordinate
Plot the average probability distribution as the energy, label the x-axis ‘pcoord’, over the entire range of the progress coordinate
plothist average pdist.h5 0::pcoord
Same as above, but only plot the energies for with progress coordinate between 0 and 10
plothist average pdist.h5 '0:0,10:pcoord'
(Note: the quotes are needed if specifying a range that includes a negative bound)
(For a simulation that uses at least 2 progress coordinates) plot the probability distribution for the 5th iteration, representing the first two progress coordinates as a heatmap
plothist instant pdist.h5 0 1 --iter 5 --linear
westpa.cli.tools.plothist module
- class westpa.cli.tools.plothist.NonUniformImage(ax, *, interpolation='nearest', **kwargs)
Bases:
AxesImage
- Parameters:
ax (~matplotlib.axes.Axes) – The axes the image will belong to.
interpolation ({'nearest', 'bilinear'}, default: 'nearest') – The interpolation scheme used in the resampling.
**kwargs – All other keyword arguments are identical to those of .AxesImage.
- mouseover = False
- make_image(renderer, magnification=1.0, unsampled=False)
Normalize, rescale, and colormap this image’s data for rendering using renderer, with the given magnification.
If unsampled is True, the image will not be scaled, but an appropriate affine transformation will be returned instead.
- Returns:
image ((M, N, 4) numpy.uint8 array) – The RGBA image, resampled unless unsampled is True.
x, y (float) – The upper left corner where the image should be drawn, in pixel space.
trans (~matplotlib.transforms.Affine2D) – The affine transformation from image to pixel space.
- set_data(x, y, A)
Set the grid for the pixel centers, and the pixel values.
- Parameters:
x (1D array-like) – Monotonic arrays of shapes (N,) and (M,), respectively, specifying pixel centers.
y (1D array-like) – Monotonic arrays of shapes (N,) and (M,), respectively, specifying pixel centers.
A (array-like) – (M, N) ~numpy.ndarray or masked array of values to be colormapped, or (M, N, 3) RGB array, or (M, N, 4) RGBA array.
- set_array(*args)
Retained for backwards compatibility - use set_data instead.
- Parameters:
A (array-like)
- set_interpolation(s)
- Parameters:
s ({'nearest', 'bilinear'} or None) – If None, use :rc:`image.interpolation`.
- get_extent()
Return the image extent as tuple (left, right, bottom, top).
- set_filternorm(filternorm)
Set whether the resize filter normalizes the weights.
See help for ~.Axes.imshow.
- Parameters:
filternorm (bool)
- set_filterrad(filterrad)
Set the resize filter radius only applicable to some interpolation schemes – see help for imshow
- Parameters:
filterrad (positive float)
- set_norm(norm)
Set the normalization instance.
- Parameters:
norm (.Normalize or str or None)
Notes
If there are any colorbars using the mappable for this norm, setting the norm of the mappable will reset the norm, locator, and formatters on the colorbar to default.
- set_cmap(cmap)
Set the colormap for luminance data.
- Parameters:
cmap (.Colormap or str or None)
- set(*, agg_filter=<UNSET>, alpha=<UNSET>, animated=<UNSET>, array=<UNSET>, clim=<UNSET>, clip_box=<UNSET>, clip_on=<UNSET>, clip_path=<UNSET>, cmap=<UNSET>, data=<UNSET>, extent=<UNSET>, filternorm=<UNSET>, filterrad=<UNSET>, gid=<UNSET>, in_layout=<UNSET>, interpolation=<UNSET>, interpolation_stage=<UNSET>, label=<UNSET>, mouseover=<UNSET>, norm=<UNSET>, path_effects=<UNSET>, picker=<UNSET>, rasterized=<UNSET>, resample=<UNSET>, sketch_params=<UNSET>, snap=<UNSET>, transform=<UNSET>, url=<UNSET>, visible=<UNSET>, zorder=<UNSET>)
Set multiple properties at once.
Supported properties are
- Properties:
agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array and two offsets from the bottom left corner of the image alpha: float or 2D array-like or None animated: bool array: unknown clim: (vmin: float, vmax: float) clip_box: ~matplotlib.transforms.BboxBase or None clip_on: bool clip_path: Patch or (Path, Transform) or None cmap: unknown data: unknown extent: 4-tuple of float figure: ~matplotlib.figure.Figure filternorm: unknown filterrad: unknown gid: str in_layout: bool interpolation: {‘nearest’, ‘bilinear’} or None interpolation_stage: {‘data’, ‘rgba’} or None label: object mouseover: bool norm: unknown path_effects: list of .AbstractPathEffect picker: None or bool or float or callable rasterized: bool resample: bool or None sketch_params: (scale: float, length: float, randomness: float) snap: bool or None transform: ~matplotlib.transforms.Transform url: str visible: bool zorder: float
- class westpa.cli.tools.plothist.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.plothist.WESTSubcommand(parent)
Bases:
WESTToolComponent
Base class for command-line tool subcommands. A little sugar for making this more uniform.
- subcommand = None
- help_text = None
- description = None
- add_to_subparsers(subparsers)
- go()
- property work_manager
The work manager for this tool. Raises AttributeError if this is not a parallel tool.
- westpa.cli.tools.plothist.normhistnd(hist, binbounds)
Normalize the N-dimensional histogram
hist
with corresponding bin boundariesbinbounds
. Modifieshist
in place and returns the normalization factor used.
- westpa.cli.tools.plothist.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- westpa.cli.tools.plothist.sum_except_along(array, axes)
Reduce the given array by addition over all axes except those listed in the scalar or iterable
axes
- class westpa.cli.tools.plothist.PlotHistBase(parent)
Bases:
WESTSubcommand
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- parse_dimspec(dimspec)
- parse_range(rangespec)
- class westpa.cli.tools.plothist.PlotSupports2D(parent)
Bases:
PlotHistBase
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.plothist.InstantPlotHist(parent)
Bases:
PlotSupports2D
- subcommand = 'instant'
- help_text = 'plot probability distribution for a single WE iteration'
- description = 'Plot a probability distribution for a single WE iteration. The probability\ndistribution must have been previously extracted with ``w_pdist`` (or, at\nleast, must be compatible with the output format of ``w_pdist``; see\n``w_pdist --help`` for more information).\n'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- do_instant_plot_1d()
Plot the histogram for iteration self.n_iter
- do_instant_plot_2d()
Plot the histogram for iteration self.n_iter
- go()
- class westpa.cli.tools.plothist.AveragePlotHist(parent)
Bases:
PlotSupports2D
- subcommand = 'average'
- help_text = 'plot average of a probability distribution over a WE simulation'
- description = 'Plot a probability distribution averaged over multiple iterations. The\nprobability distribution must have been previously extracted with ``w_pdist``\n(or, at least, must be compatible with the output format of ``w_pdist``; see\n``w_pdist --help`` for more information).\n'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- do_average_plot_1d()
Plot the average histogram for iterations self.iter_start to self.iter_stop
- do_average_plot_2d()
Plot the histogram for iteration self.n_iter
- go()
- class westpa.cli.tools.plothist.EvolutionPlotHist(parent)
Bases:
PlotHistBase
- subcommand = 'evolution'
- help_text = 'plot evolution of a probability distribution over the course of a WE simulation'
- description = 'Plot a probability distribution as it evolves over iterations. The\nprobability distribution must have been previously extracted with ``w_pdist``\n(or, at least, must be compatible with the output format of ``w_pdist``; see\n``w_pdist --help`` for more information).\n'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- go()
Plot the evolution of the histogram for iterations self.iter_start to self.iter_stop
- class westpa.cli.tools.plothist.PlotHistTool
Bases:
WESTMasterCommand
- prog = 'plothist'
- subparsers_title = 'plotting modes'
- subcommands = [<class 'westpa.cli.tools.plothist.InstantPlotHist'>, <class 'westpa.cli.tools.plothist.AveragePlotHist'>, <class 'westpa.cli.tools.plothist.EvolutionPlotHist'>]
- description = 'Plot probability density functions (histograms) generated by w_pdist or other\nprograms conforming to the same output format. This program operates in one of\nthree modes:\n\n instant\n Plot 1-D and 2-D histograms for an individual iteration. See\n ``plothist instant --help`` for more information.\n\n average\n Plot 1-D and 2-D histograms, averaged over several iterations. See\n ``plothist average --help`` for more information.\n\n evolution\n Plot the time evolution 1-D histograms as waterfall (heat map) plots.\n See ``plothist evolution --help`` for more information.\n\nThis program takes the output of ``w_pdist`` as input (see ``w_pdist --help``\nfor more information), and can generate any kind of graphical output that\nmatplotlib supports.\n\n\n------------------------------------------------------------------------------\nCommand-line options\n------------------------------------------------------------------------------\n'
- westpa.cli.tools.plothist.entry_point()
ploterr
usage:
ploterr [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
{help,d.kinetics,d.probs,rw.probs,rw.kinetics,generic} ...
Plots error ranges for weighted ensemble datasets.
Command-line options
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
supported input formats:
{help,d.kinetics,d.probs,rw.probs,rw.kinetics,generic}
help print help for this command or individual subcommands
d.kinetics output of w_direct kinetics
d.probs output of w_direct probs
rw.probs output of w_reweight probs
rw.kinetics output of w_reweight kinetics
generic arbitrary HDF5 file and dataset
westpa.cli.tools.ploterr module
- class westpa.cli.tools.ploterr.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.ploterr.WESTSubcommand(parent)
Bases:
WESTToolComponent
Base class for command-line tool subcommands. A little sugar for making this more uniform.
- subcommand = None
- help_text = None
- description = None
- add_to_subparsers(subparsers)
- go()
- property work_manager
The work manager for this tool. Raises AttributeError if this is not a parallel tool.
- class westpa.cli.tools.ploterr.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.ploterr.Plotter(h5file, h5key, iteration=-1, interface='matplotlib')
Bases:
object
This is a semi-generic plotting interface that has a built in curses based terminal plotter. It’s fairly specific to what we’re using it for here, but we could (and maybe should) build it out into a little library that we can use via the command line to plot things. Might be useful for looking at data later. That would also cut the size of this tool down by a good bit.
- plot(i=0, j=1, tau=1, iteration=None, dim=0, interface=None)
- class westpa.cli.tools.ploterr.CommonPloterrs(parent)
Bases:
WESTSubcommand
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- parse_range(rangespec)
- do_plot(data, output_filename, title=None, x_range=None, y_range=None, x_label=None, y_label=None)
- class westpa.cli.tools.ploterr.GenericIntervalSubcommand(parent)
Bases:
CommonPloterrs
- description = 'Plots generic expectation/CI data. A path to the HDF5 file and the dataset\nwithin it must be provided. This path takes the form **FILENAME/PATH[SLICE]**.\nIf the dataset is not a vector (one dimensional) then a slice must be provided.\nFor example, to access the state 0 to state 1 rate evolution calculated by\n``w_kinavg``, one would use ``kinavg.h5/rate_evolution[:,0,1]``.\n\n\n-----------------------------------------------------------------------------\nCommand-line arguments\n-----------------------------------------------------------------------------\n'
- subcommand = 'generic'
- help_text = 'arbitrary HDF5 file and dataset'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- load_and_validate_data()
- go()
- class westpa.cli.tools.ploterr.DirectKinetics(parent)
Bases:
CommonPloterrs
- subcommand = 'd.kinetics'
- help_text = 'output of w_direct kinetics'
- input_filename = 'direct.h5'
- flux_output_filename = 'flux_evolution_d_{state_label}.pdf'
- rate_output_filename = 'rate_evolution_d_{istate_label}_{fstate_label}.pdf'
- description = 'Plot evolution of state-to-state rates and total flux into states as generated\nby ``w_{direct/reweight} kinetics`` (when used with the ``--evolution-mode``\noption). Plots are generated for all rates/fluxes calculated. Output filenames\nrequire (and plot titles and axis labels support) substitution based on which\nflux/rate is being plotted:\n\n istate_label, fstate_label\n *(String, for rates)* Names of the initial and final states, as originally\n given to ``w_assign``.\n\n istate_index, fstate_index\n *(Integer, for rates)* Indices of initial and final states.\n\n state_label\n *(String, for fluxes)* Name of state\n\n state_index\n *(Integer, for fluxes)* Index of state\n'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- plot_flux(istate)
- plot_rate(istate, jstate)
- go()
- class westpa.cli.tools.ploterr.DirectStateprobs(parent)
Bases:
CommonPloterrs
- subcommand = 'd.probs'
- help_text = 'output of w_direct probs'
- input_filename = 'direct.h5'
- pop_output_filename = 'pop_evolution_d_{state_label}.pdf'
- color_output_filename = 'color_evolution_d_{state_label}.pdf'
- description = 'Plot evolution of macrostate populations and associated uncertainties. Plots\nare generated for all states calculated. Output filenames require (and plot\ntitles and axis labels support) substitution based on which state is being\nplotted:\n\n state_label\n *(String, for fluxes)* Name of state\n\n state_index\n *(Integer, for fluxes)* Index of state\n'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- plot_pop(istate)
- plot_color(istate)
- go()
- class westpa.cli.tools.ploterr.ReweightStateprobs(parent)
Bases:
DirectStateprobs
- subcommand = 'rw.probs'
- help_text = 'output of w_reweight probs'
- input_filename = 'reweight.h5'
- pop_output_filename = 'pop_evolution_rw_{state_label}.pdf'
- color_output_filename = 'color_evolution_rw_{state_label}.pdf'
- class westpa.cli.tools.ploterr.ReweightKinetics(parent)
Bases:
DirectKinetics
- subcommand = 'rw.kinetics'
- help_text = 'output of w_reweight kinetics'
- input_filename = 'reweight.h5'
- flux_output_filename = 'flux_evolution_rw_{state_label}.pdf'
- rate_output_filename = 'rate_evolution_rw_{istate_label}_{fstate_label}.pdf'
- class westpa.cli.tools.ploterr.PloterrsTool
Bases:
WESTMasterCommand
- prog = 'ploterrs'
- subcommands = [<class 'westpa.cli.tools.ploterr.DirectKinetics'>, <class 'westpa.cli.tools.ploterr.DirectStateprobs'>, <class 'westpa.cli.tools.ploterr.ReweightStateprobs'>, <class 'westpa.cli.tools.ploterr.ReweightKinetics'>, <class 'westpa.cli.tools.ploterr.GenericIntervalSubcommand'>]
- subparsers_title = 'supported input formats'
- description = 'Plots error ranges for weighted ensemble datasets.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- westpa.cli.tools.ploterr.entry_point()
westpa.cli package
w_kinetics
WARNING: w_kinetics is being deprecated. Please use w_direct instead.
usage:
w_kinetics trace [-h] [-W WEST_H5FILE] [--first-iter N_ITER] [--last-iter N_ITER]
[--step-iter STEP] [-a ASSIGNMENTS] [-o OUTPUT]
Calculate state-to-state rates and transition event durations by tracing trajectories.
A bin assignment file (usually “assign.h5”) including trajectory labeling is required (see “w_assign –help” for information on generating this file).
This subcommand for w_direct is used as input for all other w_direct subcommands, which will convert the flux data in the output file into average rates/fluxes/populations with confidence intervals.
Output format
The output file (-o/–output, by default “direct.h5”) contains the following datasets:
``/conditional_fluxes`` [iteration][state][state]
*(Floating-point)* Macrostate-to-macrostate fluxes. These are **not**
normalized by the population of the initial macrostate.
``/conditional_arrivals`` [iteration][stateA][stateB]
*(Integer)* Number of trajectories arriving at state *stateB* in a given
iteration, given that they departed from *stateA*.
``/total_fluxes`` [iteration][state]
*(Floating-point)* Total flux into a given macrostate.
``/arrivals`` [iteration][state]
*(Integer)* Number of trajectories arriving at a given state in a given
iteration, regardless of where they originated.
``/duration_count`` [iteration]
*(Integer)* The number of event durations recorded in each iteration.
``/durations`` [iteration][event duration]
*(Structured -- see below)* Event durations for transition events ending
during a given iteration. These are stored as follows:
istate
*(Integer)* Initial state of transition event.
fstate
*(Integer)* Final state of transition event.
duration
*(Floating-point)* Duration of transition, in units of tau.
weight
*(Floating-point)* Weight of trajectory at end of transition, **not**
normalized by initial state population.
Because state-to-state fluxes stored in this file are not normalized by
initial macrostate population, they cannot be used as rates without further
processing. The w_direct kinetics
command is used to perform this normalization
while taking statistical fluctuation and correlation into account. See
w_direct kinetics --help
for more information. Target fluxes (total flux
into a given state) require no such normalization.
Command-line options
optional arguments:
-h, --help show this help message and exit
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
--step-iter STEP Analyze/report in blocks of STEP iterations.
input/output options:
-a ASSIGNMENTS, --assignments ASSIGNMENTS
Bin assignments and macrostate definitions are in ASSIGNMENTS (default:
assign.h5).
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: kintrace.h5).
westpa.cli.tools.w_kinetics module
- class westpa.cli.tools.w_kinetics.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.w_kinetics.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- westpa.cli.tools.w_kinetics.warn()
Issue a warning, or maybe ignore it or raise an exception.
- message
Text of the warning message.
- category
The Warning category subclass. Defaults to UserWarning.
- stacklevel
How far up the call stack to make this warning appear. A value of 2 for example attributes the warning to the caller of the code calling warn().
- source
If supplied, the destroyed object which emitted a ResourceWarning
- skip_file_prefixes
An optional tuple of module filename prefixes indicating frames to skip during stacklevel computations for stack frame attribution.
- class westpa.cli.tools.w_kinetics.DKinetics(parent)
Bases:
WESTKineticsBase
,WKinetics
- subcommand = 'init'
- default_kinetics_file = 'direct.h5'
- default_output_file = 'direct.h5'
- help_text = 'calculate state-to-state kinetics by tracing trajectories'
- description = 'Calculate state-to-state rates and transition event durations by tracing\ntrajectories.\n\nA bin assignment file (usually "assign.h5") including trajectory labeling\nis required (see "w_assign --help" for information on generating this file).\n\nThis subcommand for w_direct is used as input for all other w_direct\nsubcommands, which will convert the flux data in the output file into\naverage rates/fluxes/populations with confidence intervals.\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "direct.h5") contains the\nfollowing datasets:\n\n ``/conditional_fluxes`` [iteration][state][state]\n *(Floating-point)* Macrostate-to-macrostate fluxes. These are **not**\n normalized by the population of the initial macrostate.\n\n ``/conditional_arrivals`` [iteration][stateA][stateB]\n *(Integer)* Number of trajectories arriving at state *stateB* in a given\n iteration, given that they departed from *stateA*.\n\n ``/total_fluxes`` [iteration][state]\n *(Floating-point)* Total flux into a given macrostate.\n\n ``/arrivals`` [iteration][state]\n *(Integer)* Number of trajectories arriving at a given state in a given\n iteration, regardless of where they originated.\n\n ``/duration_count`` [iteration]\n *(Integer)* The number of event durations recorded in each iteration.\n\n ``/durations`` [iteration][event duration]\n *(Structured -- see below)* Event durations for transition events ending\n during a given iteration. These are stored as follows:\n\n istate\n *(Integer)* Initial state of transition event.\n fstate\n *(Integer)* Final state of transition event.\n duration\n *(Floating-point)* Duration of transition, in units of tau.\n weight\n *(Floating-point)* Weight of trajectory at end of transition, **not**\n normalized by initial state population.\n\nBecause state-to-state fluxes stored in this file are not normalized by\ninitial macrostate population, they cannot be used as rates without further\nprocessing. The ``w_direct kinetics`` command is used to perform this normalization\nwhile taking statistical fluctuation and correlation into account. See\n``w_direct kinetics --help`` for more information. Target fluxes (total flux\ninto a given state) require no such normalization.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- open_files()
- go()
- class westpa.cli.tools.w_kinetics.WKinetics(parent)
Bases:
DKinetics
- subcommand = 'trace'
- help_text = 'averages and CIs for path-tracing kinetics analysis'
- default_output_file = 'kintrace.h5'
- class westpa.cli.tools.w_kinetics.WDirect
Bases:
WESTMasterCommand
,WESTParallelTool
- prog = 'w_kinetics'
- subcommands = [<class 'westpa.cli.tools.w_kinetics.WKinetics'>]
- subparsers_title = 'calculate state-to-state kinetics by tracing trajectories'
- description = 'Calculate state-to-state rates and transition event durations by tracing\ntrajectories.\n\nA bin assignment file (usually "assign.h5") including trajectory labeling\nis required (see "w_assign --help" for information on generating this file).\n\nThe output generated by this program is used as input for the ``w_kinavg``\ntool, which converts the flux data in the output file into average rates\nwith confidence intervals. See ``w_kinavg trace --help`` for more\ninformation.\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "kintrace.h5") contains the\nfollowing datasets:\n\n ``/conditional_fluxes`` [iteration][state][state]\n *(Floating-point)* Macrostate-to-macrostate fluxes. These are **not**\n normalized by the population of the initial macrostate.\n\n ``/conditional_arrivals`` [iteration][stateA][stateB]\n *(Integer)* Number of trajectories arriving at state *stateB* in a given\n iteration, given that they departed from *stateA*.\n\n ``/total_fluxes`` [iteration][state]\n *(Floating-point)* Total flux into a given macrostate.\n\n ``/arrivals`` [iteration][state]\n *(Integer)* Number of trajectories arriving at a given state in a given\n iteration, regardless of where they originated.\n\n ``/duration_count`` [iteration]\n *(Integer)* The number of event durations recorded in each iteration.\n\n ``/durations`` [iteration][event duration]\n *(Structured -- see below)* Event durations for transition events ending\n during a given iteration. These are stored as follows:\n\n istate\n *(Integer)* Initial state of transition event.\n fstate\n *(Integer)* Final state of transition event.\n duration\n *(Floating-point)* Duration of transition, in units of tau.\n weight\n *(Floating-point)* Weight of trajectory at end of transition, **not**\n normalized by initial state population.\n\nBecause state-to-state fluxes stored in this file are not normalized by\ninitial macrostate population, they cannot be used as rates without further\nprocessing. The ``w_kinavg`` command is used to perform this normalization\nwhile taking statistical fluctuation and correlation into account. See\n``w_kinavg trace --help`` for more information. Target fluxes (total flux\ninto a given state) require no such normalization.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- westpa.cli.tools.w_kinetics.entry_point()
w_stateprobs
WARNING: w_stateprobs is being deprecated. Please use w_direct instead.
usage:
w_stateprobs trace [-h] [-W WEST_H5FILE] [--first-iter N_ITER] [--last-iter N_ITER]
[--step-iter STEP] [-a ASSIGNMENTS] [-o OUTPUT] [-k KINETICS]
[--disable-bootstrap] [--disable-correl] [--alpha ALPHA]
[--autocorrel-alpha ACALPHA] [--nsets NSETS] [-e {cumulative,blocked,none}]
[--window-frac WINDOW_FRAC] [--disable-averages]
Calculate average populations and associated errors in state populations from weighted ensemble data. Bin assignments, including macrostate definitions, are required. (See “w_assign –help” for more information).
Output format
The output file (-o/–output, usually “direct.h5”) contains the following dataset:
/avg_state_probs [state]
(Structured -- see below) Population of each state across entire
range specified.
/avg_color_probs [state]
(Structured -- see below) Population of each ensemble across entire
range specified.
If –evolution-mode is specified, then the following additional datasets are available:
/state_pop_evolution [window][state]
(Structured -- see below). State populations based on windows of
iterations of varying width. If --evolution-mode=cumulative, then
these windows all begin at the iteration specified with
--start-iter and grow in length by --step-iter for each successive
element. If --evolution-mode=blocked, then these windows are all of
width --step-iter (excluding the last, which may be shorter), the first
of which begins at iteration --start-iter.
/color_prob_evolution [window][state]
(Structured -- see below). Ensemble populations based on windows of
iterations of varying width. If --evolution-mode=cumulative, then
these windows all begin at the iteration specified with
--start-iter and grow in length by --step-iter for each successive
element. If --evolution-mode=blocked, then these windows are all of
width --step-iter (excluding the last, which may be shorter), the first
of which begins at iteration --start-iter.
The structure of these datasets is as follows:
iter_start
(Integer) Iteration at which the averaging window begins (inclusive).
iter_stop
(Integer) Iteration at which the averaging window ends (exclusive).
expected
(Floating-point) Expected (mean) value of the observable as evaluated within
this window, in units of inverse tau.
ci_lbound
(Floating-point) Lower bound of the confidence interval of the observable
within this window, in units of inverse tau.
ci_ubound
(Floating-point) Upper bound of the confidence interval of the observable
within this window, in units of inverse tau.
stderr
(Floating-point) The standard error of the mean of the observable
within this window, in units of inverse tau.
corr_len
(Integer) Correlation length of the observable within this window, in units
of tau.
Each of these datasets is also stamped with a number of attributes:
mcbs_alpha
(Floating-point) Alpha value of confidence intervals. (For example,
*alpha=0.05* corresponds to a 95% confidence interval.)
mcbs_nsets
(Integer) Number of bootstrap data sets used in generating confidence
intervals.
mcbs_acalpha
(Floating-point) Alpha value for determining correlation lengths.
Command-line options
optional arguments:
-h, --help show this help message and exit
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
--step-iter STEP Analyze/report in blocks of STEP iterations.
input/output options:
-a ASSIGNMENTS, --assignments ASSIGNMENTS
Bin assignments and macrostate definitions are in ASSIGNMENTS (default:
assign.h5).
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: stateprobs.h5).
input/output options:
-k KINETICS, --kinetics KINETICS
Populations and transition rates are stored in KINETICS (default: assign.h5).
confidence interval calculation options:
--disable-bootstrap, -db
Enable the use of Monte Carlo Block Bootstrapping.
--disable-correl, -dc
Disable the correlation analysis.
--alpha ALPHA Calculate a (1-ALPHA) confidence interval' (default: 0.05)
--autocorrel-alpha ACALPHA
Evaluate autocorrelation to (1-ACALPHA) significance. Note that too small an
ACALPHA will result in failure to detect autocorrelation in a noisy flux signal.
(Default: same as ALPHA.)
--nsets NSETS Use NSETS samples for bootstrapping (default: chosen based on ALPHA)
calculation options:
-e {cumulative,blocked,none}, --evolution-mode {cumulative,blocked,none}
How to calculate time evolution of rate estimates. ``cumulative`` evaluates rates
over windows starting with --start-iter and getting progressively wider to --stop-
iter by steps of --step-iter. ``blocked`` evaluates rates over windows of width
--step-iter, the first of which begins at --start-iter. ``none`` (the default)
disables calculation of the time evolution of rate estimates.
--window-frac WINDOW_FRAC
Fraction of iterations to use in each window when running in ``cumulative`` mode.
The (1 - frac) fraction of iterations will be discarded from the start of each
window.
misc options:
--disable-averages, -da
Whether or not the averages should be printed to the console (set to FALSE if flag
is used).
westpa.cli.tools.w_stateprobs module
- class westpa.cli.tools.w_stateprobs.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.w_stateprobs.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- westpa.cli.tools.w_stateprobs.warn()
Issue a warning, or maybe ignore it or raise an exception.
- message
Text of the warning message.
- category
The Warning category subclass. Defaults to UserWarning.
- stacklevel
How far up the call stack to make this warning appear. A value of 2 for example attributes the warning to the caller of the code calling warn().
- source
If supplied, the destroyed object which emitted a ResourceWarning
- skip_file_prefixes
An optional tuple of module filename prefixes indicating frames to skip during stacklevel computations for stack frame attribution.
- class westpa.cli.tools.w_stateprobs.DStateProbs(parent)
Bases:
AverageCommands
- subcommand = 'probs'
- help_text = 'Calculates color and state probabilities via tracing.'
- default_kinetics_file = 'direct.h5'
- description = 'Calculate average populations and associated errors in state populations from\nweighted ensemble data. Bin assignments, including macrostate definitions,\nare required. (See "w_assign --help" for more information).\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, usually "direct.h5") contains the following\ndataset:\n\n /avg_state_probs [state]\n (Structured -- see below) Population of each state across entire\n range specified.\n\n /avg_color_probs [state]\n (Structured -- see below) Population of each ensemble across entire\n range specified.\n\nIf --evolution-mode is specified, then the following additional datasets are\navailable:\n\n /state_pop_evolution [window][state]\n (Structured -- see below). State populations based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\n /color_prob_evolution [window][state]\n (Structured -- see below). Ensemble populations based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\nThe structure of these datasets is as follows:\n\n iter_start\n (Integer) Iteration at which the averaging window begins (inclusive).\n\n iter_stop\n (Integer) Iteration at which the averaging window ends (exclusive).\n\n expected\n (Floating-point) Expected (mean) value of the observable as evaluated within\n this window, in units of inverse tau.\n\n ci_lbound\n (Floating-point) Lower bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n ci_ubound\n (Floating-point) Upper bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n stderr\n (Floating-point) The standard error of the mean of the observable\n within this window, in units of inverse tau.\n\n corr_len\n (Integer) Correlation length of the observable within this window, in units\n of tau.\n\nEach of these datasets is also stamped with a number of attributes:\n\n mcbs_alpha\n (Floating-point) Alpha value of confidence intervals. (For example,\n *alpha=0.05* corresponds to a 95% confidence interval.)\n\n mcbs_nsets\n (Integer) Number of bootstrap data sets used in generating confidence\n intervals.\n\n mcbs_acalpha\n (Floating-point) Alpha value for determining correlation lengths.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- calculate_state_populations(pops)
- w_stateprobs()
- go()
- class westpa.cli.tools.w_stateprobs.WStateProbs(parent)
Bases:
DStateProbs
- subcommand = 'trace'
- help_text = 'averages and CIs for path-tracing kinetics analysis'
- default_output_file = 'stateprobs.h5'
- default_kinetics_file = 'assign.h5'
- class westpa.cli.tools.w_stateprobs.WDirect
Bases:
WESTMasterCommand
,WESTParallelTool
- prog = 'w_stateprobs'
- subcommands = [<class 'westpa.cli.tools.w_stateprobs.WStateProbs'>]
- subparsers_title = 'calculate state-to-state kinetics by tracing trajectories'
- description = 'Calculate average populations and associated errors in state populations from\nweighted ensemble data. Bin assignments, including macrostate definitions,\nare required. (See "w_assign --help" for more information).\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, usually "stateprobs.h5") contains the following\ndataset:\n\n /avg_state_pops [state]\n (Structured -- see below) Population of each state across entire\n range specified.\n\nIf --evolution-mode is specified, then the following additional dataset is\navailable:\n\n /state_pop_evolution [window][state]\n (Structured -- see below). State populations based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\nThe structure of these datasets is as follows:\n\n iter_start\n (Integer) Iteration at which the averaging window begins (inclusive).\n\n iter_stop\n (Integer) Iteration at which the averaging window ends (exclusive).\n\n expected\n (Floating-point) Expected (mean) value of the rate as evaluated within\n this window, in units of inverse tau.\n\n ci_lbound\n (Floating-point) Lower bound of the confidence interval on the rate\n within this window, in units of inverse tau.\n\n ci_ubound\n (Floating-point) Upper bound of the confidence interval on the rate\n within this window, in units of inverse tau.\n\n corr_len\n (Integer) Correlation length of the rate within this window, in units\n of tau.\n\nEach of these datasets is also stamped with a number of attributes:\n\n mcbs_alpha\n (Floating-point) Alpha value of confidence intervals. (For example,\n *alpha=0.05* corresponds to a 95% confidence interval.)\n\n mcbs_nsets\n (Integer) Number of bootstrap data sets used in generating confidence\n intervals.\n\n mcbs_acalpha\n (Floating-point) Alpha value for determining correlation lengths.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- westpa.cli.tools.w_stateprobs.entry_point()
w_dumpsegs
westpa.cli.tools.w_dumpsegs module
- westpa.cli.tools.w_dumpsegs.warn()
Issue a warning, or maybe ignore it or raise an exception.
- message
Text of the warning message.
- category
The Warning category subclass. Defaults to UserWarning.
- stacklevel
How far up the call stack to make this warning appear. A value of 2 for example attributes the warning to the caller of the code calling warn().
- source
If supplied, the destroyed object which emitted a ResourceWarning
- skip_file_prefixes
An optional tuple of module filename prefixes indicating frames to skip during stacklevel computations for stack frame attribution.
- class westpa.cli.tools.w_dumpsegs.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- class westpa.cli.tools.w_dumpsegs.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_dumpsegs.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.cli.tools.w_dumpsegs.WDumpSegs
Bases:
WESTTool
- prog = 'w_dumpsegs'
- description = 'Dump segment data as text. This is very inefficient, so this tool should be used\nas a last resort (use hdfview/h5ls to look at data, and access HDF5 directly for\nsignificant analysis tasks).\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_dumpsegs.entry_point()
w_postanalysis_matrix
westpa.cli.tools.w_postanalysis_matrix module
- class westpa.cli.tools.w_postanalysis_matrix.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.w_postanalysis_matrix.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- westpa.cli.tools.w_postanalysis_matrix.warn()
Issue a warning, or maybe ignore it or raise an exception.
- message
Text of the warning message.
- category
The Warning category subclass. Defaults to UserWarning.
- stacklevel
How far up the call stack to make this warning appear. A value of 2 for example attributes the warning to the caller of the code calling warn().
- source
If supplied, the destroyed object which emitted a ResourceWarning
- skip_file_prefixes
An optional tuple of module filename prefixes indicating frames to skip during stacklevel computations for stack frame attribution.
- class westpa.cli.tools.w_postanalysis_matrix.RWMatrix(parent)
Bases:
WESTKineticsBase
,FluxMatrix
- subcommand = 'init'
- default_kinetics_file = 'reweight.h5'
- default_output_file = 'reweight.h5'
- help_text = 'create a color-labeled transition matrix from a WESTPA simulation'
- description = 'Generate a colored transition matrix from a WE assignment file. The subsequent\nanalysis requires that the assignments are calculated using only the initial and\nfinal time points of each trajectory segment. This may require downsampling the\nh5file generated by a WE simulation. In the future w_assign may be enhanced to optionally\ngenerate the necessary assignment file from a h5file with intermediate time points.\nAdditionally, this analysis is currently only valid on simulations performed under\neither equilibrium or steady-state conditions without recycling target states.\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "reweight.h5") contains the\nfollowing datasets:\n\n ``/bin_populations`` [window, bin]\n The reweighted populations of each bin based on windows. Bins contain\n one color each, so to recover the original un-colored spatial bins,\n one must sum over all states.\n\n ``/iterations`` [iteration]\n *(Structured -- see below)* Sparse matrix data from each\n iteration. They are reconstructed and averaged within the\n w_reweight {kinetics/probs} routines so that observables may\n be calculated. Each group contains 4 vectors of data:\n\n flux\n *(Floating-point)* The weight of a series of flux events\n cols\n *(Integer)* The bin from which a flux event began.\n cols\n *(Integer)* The bin into which the walker fluxed.\n obs\n *(Integer)* How many flux events were observed during this\n iteration.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- go()
- class westpa.cli.tools.w_postanalysis_matrix.PAMatrix(parent)
Bases:
RWMatrix
- subcommand = 'init'
- help_text = 'averages and CIs for path-tracing kinetics analysis'
- default_output_file = 'flux_matrices.h5'
- class westpa.cli.tools.w_postanalysis_matrix.WReweight
Bases:
WESTMasterCommand
,WESTParallelTool
- prog = 'w_postanalysis_matrix'
- subcommands = [<class 'westpa.cli.tools.w_postanalysis_matrix.PAMatrix'>]
- subparsers_title = 'calculate state-to-state kinetics by tracing trajectories'
- description = 'Generate a colored transition matrix from a WE assignment file. The subsequent\nanalysis requires that the assignments are calculated using only the initial and\nfinal time points of each trajectory segment. This may require downsampling the\nh5file generated by a WE simulation. In the future w_assign may be enhanced to optionally\ngenerate the necessary assignment file from a h5file with intermediate time points.\nAdditionally, this analysis is currently only valid on simulations performed under\neither equilibrium or steady-state conditions without recycling target states.\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "reweight.h5") contains the\nfollowing datasets:\n\n ``/bin_populations`` [window, bin]\n The reweighted populations of each bin based on windows. Bins contain\n one color each, so to recover the original un-colored spatial bins,\n one must sum over all states.\n\n ``/iterations`` [iteration]\n *(Structured -- see below)* Sparse matrix data from each\n iteration. They are reconstructed and averaged within the\n w_reweight {kinetics/probs} routines so that observables may\n be calculated. Each group contains 4 vectors of data:\n\n flux\n *(Floating-point)* The weight of a series of flux events\n cols\n *(Integer)* The bin from which a flux event began.\n cols\n *(Integer)* The bin into which the walker fluxed.\n obs\n *(Integer)* How many flux events were observed during this\n iteration.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- westpa.cli.tools.w_postanalysis_matrix.entry_point()
w_postanalysis_reweight
westpa.cli.tools.w_postanalysis_reweight module
- class westpa.cli.tools.w_postanalysis_reweight.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.w_postanalysis_reweight.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- westpa.cli.tools.w_postanalysis_reweight.warn()
Issue a warning, or maybe ignore it or raise an exception.
- message
Text of the warning message.
- category
The Warning category subclass. Defaults to UserWarning.
- stacklevel
How far up the call stack to make this warning appear. A value of 2 for example attributes the warning to the caller of the code calling warn().
- source
If supplied, the destroyed object which emitted a ResourceWarning
- skip_file_prefixes
An optional tuple of module filename prefixes indicating frames to skip during stacklevel computations for stack frame attribution.
- class westpa.cli.tools.w_postanalysis_reweight.RWAverage(parent)
Bases:
RWStateProbs
,RWRate
- subcommand = 'average'
- help_text = 'Averages and returns fluxes, rates, and color/state populations.'
- default_kinetics_file = 'reweight.h5'
- default_output_file = 'reweight.h5'
- description = 'A convenience function to run kinetics/probs. Bin assignments,\nincluding macrostate definitions, are required. (See\n"w_assign --help" for more information).\n\nFor more information on the individual subcommands this subs in for, run\nw_reweight {kinetics/probs} --help.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- go()
- class westpa.cli.tools.w_postanalysis_reweight.PAAverage(parent)
Bases:
RWAverage
- subcommand = 'average'
- help_text = ''
- default_output_file = 'kinrw.h5'
- default_kinetics_file = 'flux_matrices.h5'
- class westpa.cli.tools.w_postanalysis_reweight.WReweight
Bases:
WESTMasterCommand
,WESTParallelTool
- prog = 'w_postanalysis_reweight'
- subcommands = [<class 'westpa.cli.tools.w_postanalysis_reweight.PAAverage'>]
- subparsers_title = 'calculate state-to-state kinetics by tracing trajectories'
- description = 'A convenience function to run kinetics/probs. Bin assignments,\nincluding macrostate definitions, are required. (See\n"w_assign --help" for more information).\n\nFor more information on the individual subcommands this subs in for, run\nw_reweight {kinetics/probs} --help.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- westpa.cli.tools.w_postanalysis_reweight.entry_point()
w_reweight
westpa.cli.tools.w_reweight module
- class westpa.cli.tools.w_reweight.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.cli.tools.w_reweight.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_reweight.WESTKineticsBase(parent)
Bases:
WESTSubcommand
Common argument processing for w_direct/w_reweight subcommands. Mostly limited to handling input and output from w_assign.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_reweight.AverageCommands(parent)
Bases:
WESTKineticsBase
- default_output_file = 'direct.h5'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- stamp_mcbs_info(dataset)
- open_files()
- open_assignments()
- print_averages(dataset, header, dim=1)
- run_calculation(pi, nstates, start_iter, stop_iter, step_iter, dataset, eval_block, name, dim, do_averages=False, **extra)
- westpa.cli.tools.w_reweight.generate_future(work_manager, name, eval_block, kwargs)
- westpa.cli.tools.w_reweight.mcbs_ci_correl(estimator_datasets, estimator, alpha, n_sets=None, args=None, autocorrel_alpha=None, autocorrel_n_sets=None, subsample=None, do_correl=True, mcbs_enable=None, estimator_kwargs={})
Perform a Monte Carlo bootstrap estimate for the (1-
alpha
) confidence interval on the givendataset
with the givenestimator
. This routine is appropriate for time-correlated data, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8 to determine a statistically-significant correlation time and then reducing the dataset by a factor of that correlation time before running a “classic” Monte Carlo bootstrap.Returns
(estimate, ci_lb, ci_ub, correl_time)
whereestimate
is the application of the givenestimator
to the inputdataset
,ci_lb
andci_ub
are the lower and upper limits, respectively, of the (1-alpha
) confidence interval onestimate
, andcorrel_time
is the correlation time of the dataset, significant to (1-autocorrel_alpha
).estimator
is called asestimator(dataset, *args, **kwargs)
. Common estimators include:np.mean – calculate the confidence interval on the mean of
dataset
np.median – calculate a confidence interval on the median of
dataset
np.std – calculate a confidence interval on the standard deviation of
datset
.
n_sets
is the number of synthetic data sets to generate using the givenestimator
, which will be chosen using `get_bssize()`_ ifn_sets
is not given.autocorrel_alpha
(which defaults toalpha
) can be used to adjust the significance level of the autocorrelation calculation. Note that too high a significance level (too low an alpha) for evaluating the significance of autocorrelation values can result in a failure to detect correlation if the autocorrelation function is noisy.The given
subsample
function is used, if provided, to subsample the dataset prior to running the full Monte Carlo bootstrap. If none is provided, then a random entry from each correlated block is used as the value for that block. Other reasonable choices includenp.mean
,np.median
,(lambda x: x[0])
or(lambda x: x[-1])
. In particular, usingsubsample=np.mean
will converge to the block averaged mean and standard error, while accounting for any non-normality in the distribution of the mean.
- westpa.cli.tools.w_reweight.reweight_for_c(rows, cols, obs, flux, insert, indices, nstates, nbins, state_labels, state_map, nfbins, istate, jstate, stride, bin_last_state_map, bin_state_map, return_obs, obs_threshold=1)
- class westpa.cli.tools.w_reweight.RWMatrix(parent)
Bases:
WESTKineticsBase
,FluxMatrix
- subcommand = 'init'
- default_kinetics_file = 'reweight.h5'
- default_output_file = 'reweight.h5'
- help_text = 'create a color-labeled transition matrix from a WESTPA simulation'
- description = 'Generate a colored transition matrix from a WE assignment file. The subsequent\nanalysis requires that the assignments are calculated using only the initial and\nfinal time points of each trajectory segment. This may require downsampling the\nh5file generated by a WE simulation. In the future w_assign may be enhanced to optionally\ngenerate the necessary assignment file from a h5file with intermediate time points.\nAdditionally, this analysis is currently only valid on simulations performed under\neither equilibrium or steady-state conditions without recycling target states.\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "reweight.h5") contains the\nfollowing datasets:\n\n ``/bin_populations`` [window, bin]\n The reweighted populations of each bin based on windows. Bins contain\n one color each, so to recover the original un-colored spatial bins,\n one must sum over all states.\n\n ``/iterations`` [iteration]\n *(Structured -- see below)* Sparse matrix data from each\n iteration. They are reconstructed and averaged within the\n w_reweight {kinetics/probs} routines so that observables may\n be calculated. Each group contains 4 vectors of data:\n\n flux\n *(Floating-point)* The weight of a series of flux events\n cols\n *(Integer)* The bin from which a flux event began.\n cols\n *(Integer)* The bin into which the walker fluxed.\n obs\n *(Integer)* How many flux events were observed during this\n iteration.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- go()
- class westpa.cli.tools.w_reweight.RWReweight(parent)
Bases:
AverageCommands
- help_text = 'Parent class for all reweighting routines, as they all use the same estimator code.'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- accumulate_statistics(start_iter, stop_iter)
This function pulls previously generated flux matrix data into memory. The data is assumed to exist within an HDF5 file that is available as a property. The data is kept as a single dimensional numpy array to use with the cython estimator.
- generate_reweight_data()
This function ensures all the appropriate files are loaded, sets appropriate attributes necessary for all calling functions/children, and then calls the function to load in the flux matrix data.
- class westpa.cli.tools.w_reweight.RWRate(parent)
Bases:
RWReweight
- subcommand = 'kinetics'
- help_text = 'Generates rate and flux values from a WESTPA simulation via reweighting.'
- default_kinetics_file = 'reweight.h5'
- default_output_file = 'reweight.h5'
- description = 'Calculate average rates from weighted ensemble data using the postanalysis\nreweighting scheme. Bin assignments (usually "assign.h5") and pre-calculated\niteration flux matrices (usually "reweight.h5") data files must have been\npreviously generated using w_reweight matrix (see "w_assign --help" and\n"w_reweight init --help" for information on generating these files).\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\nThe output file (-o/--output, usually "kinrw.h5") contains the following\ndataset:\n\n /avg_rates [state,state]\n (Structured -- see below) State-to-state rates based on entire window of\n iterations selected.\n\n /avg_total_fluxes [state]\n (Structured -- see below) Total fluxes into each state based on entire\n window of iterations selected.\n\n /avg_conditional_fluxes [state,state]\n (Structured -- see below) State-to-state fluxes based on entire window of\n iterations selected.\n\nIf --evolution-mode is specified, then the following additional datasets are\navailable:\n\n /rate_evolution [window][state][state]\n (Structured -- see below). State-to-state rates based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\n /target_flux_evolution [window,state]\n (Structured -- see below). Total flux into a given macro state based on\n windows of iterations of varying width, as in /rate_evolution.\n\n /conditional_flux_evolution [window,state,state]\n (Structured -- see below). State-to-state fluxes based on windows of\n varying width, as in /rate_evolution.\n\nThe structure of these datasets is as follows:\n\n iter_start\n (Integer) Iteration at which the averaging window begins (inclusive).\n\n iter_stop\n (Integer) Iteration at which the averaging window ends (exclusive).\n\n expected\n (Floating-point) Expected (mean) value of the observable as evaluated within\n this window, in units of inverse tau.\n\n ci_lbound\n (Floating-point) Lower bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n ci_ubound\n (Floating-point) Upper bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n stderr\n (Floating-point) The standard error of the mean of the observable\n within this window, in units of inverse tau.\n\n corr_len\n (Integer) Correlation length of the observable within this window, in units\n of tau.\n\nEach of these datasets is also stamped with a number of attributes:\n\n mcbs_alpha\n (Floating-point) Alpha value of confidence intervals. (For example,\n *alpha=0.05* corresponds to a 95% confidence interval.)\n\n mcbs_nsets\n (Integer) Number of bootstrap data sets used in generating confidence\n intervals.\n\n mcbs_acalpha\n (Floating-point) Alpha value for determining correlation lengths.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n '
- w_postanalysis_reweight()
This function ensures the data is ready to send in to the estimator and the bootstrapping routine, then does so. Much of this is simply setting up appropriate args and kwargs, then passing them in to the ‘run_calculation’ function, which sets up future objects to send to the work manager. The results are returned, and then written to the appropriate HDF5 dataset. This function is specific for the rates and fluxes from the reweighting method.
- go()
- class westpa.cli.tools.w_reweight.RWStateProbs(parent)
Bases:
RWReweight
- subcommand = 'probs'
- help_text = 'Calculates color and state probabilities via reweighting.'
- default_kinetics_file = 'reweight.h5'
- description = 'Calculate average populations from weighted ensemble data using the postanalysis\nreweighting scheme. Bin assignments (usually "assign.h5") and pre-calculated\niteration flux matrices (usually "reweight.h5") data files must have been\npreviously generated using w_reweight matrix (see "w_assign --help" and\n"w_reweight init --help" for information on generating these files).\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, usually "direct.h5") contains the following\ndataset:\n\n /avg_state_probs [state]\n (Structured -- see below) Population of each state across entire\n range specified.\n\n /avg_color_probs [state]\n (Structured -- see below) Population of each ensemble across entire\n range specified.\n\nIf --evolution-mode is specified, then the following additional datasets are\navailable:\n\n /state_pop_evolution [window][state]\n (Structured -- see below). State populations based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\n /color_prob_evolution [window][state]\n (Structured -- see below). Ensemble populations based on windows of\n iterations of varying width. If --evolution-mode=cumulative, then\n these windows all begin at the iteration specified with\n --start-iter and grow in length by --step-iter for each successive\n element. If --evolution-mode=blocked, then these windows are all of\n width --step-iter (excluding the last, which may be shorter), the first\n of which begins at iteration --start-iter.\n\nThe structure of these datasets is as follows:\n\n iter_start\n (Integer) Iteration at which the averaging window begins (inclusive).\n\n iter_stop\n (Integer) Iteration at which the averaging window ends (exclusive).\n\n expected\n (Floating-point) Expected (mean) value of the observable as evaluated within\n this window, in units of inverse tau.\n\n ci_lbound\n (Floating-point) Lower bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n ci_ubound\n (Floating-point) Upper bound of the confidence interval of the observable\n within this window, in units of inverse tau.\n\n stderr\n (Floating-point) The standard error of the mean of the observable\n within this window, in units of inverse tau.\n\n corr_len\n (Integer) Correlation length of the observable within this window, in units\n of tau.\n\n\nEach of these datasets is also stamped with a number of attributes:\n\n mcbs_alpha\n (Floating-point) Alpha value of confidence intervals. (For example,\n *alpha=0.05* corresponds to a 95% confidence interval.)\n\n mcbs_nsets\n (Integer) Number of bootstrap data sets used in generating confidence\n intervals.\n\n mcbs_acalpha\n (Floating-point) Alpha value for determining correlation lengths.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- w_postanalysis_stateprobs()
This function ensures the data is ready to send in to the estimator and the bootstrapping routine, then does so. Much of this is simply setting up appropriate args and kwargs, then passing them in to the ‘run_calculation’ function, which sets up future objects to send to the work manager. The results are returned, and then written to the appropriate HDF5 dataset. This function is specific for the color (steady-state) and macrostate probabilities from the reweighting method.
- go()
- class westpa.cli.tools.w_reweight.RWAll(parent)
Bases:
RWMatrix
,RWStateProbs
,RWRate
- subcommand = 'all'
- help_text = 'Runs the full suite, including the generation of the flux matrices.'
- default_kinetics_file = 'reweight.h5'
- default_output_file = 'reweight.h5'
- description = 'A convenience function to run init/kinetics/probs. Bin assignments,\nincluding macrostate definitions, are required. (See\n"w_assign --help" for more information).\n\nFor more information on the individual subcommands this subs in for, run\nw_reweight {init/kinetics/probs} --help.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- go()
- class westpa.cli.tools.w_reweight.RWAverage(parent)
Bases:
RWStateProbs
,RWRate
- subcommand = 'average'
- help_text = 'Averages and returns fluxes, rates, and color/state populations.'
- default_kinetics_file = 'reweight.h5'
- default_output_file = 'reweight.h5'
- description = 'A convenience function to run kinetics/probs. Bin assignments,\nincluding macrostate definitions, are required. (See\n"w_assign --help" for more information).\n\nFor more information on the individual subcommands this subs in for, run\nw_reweight {kinetics/probs} --help.\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- go()
- class westpa.cli.tools.w_reweight.WReweight
Bases:
WESTMasterCommand
,WESTParallelTool
- prog = 'w_reweight'
- subcommands = [<class 'westpa.cli.tools.w_reweight.RWMatrix'>, <class 'westpa.cli.tools.w_reweight.RWAverage'>, <class 'westpa.cli.tools.w_reweight.RWRate'>, <class 'westpa.cli.tools.w_reweight.RWStateProbs'>, <class 'westpa.cli.tools.w_reweight.RWAll'>]
- subparsers_title = 'reweighting kinetics analysis scheme'
- westpa.cli.tools.w_reweight.entry_point()
w_fluxanl
w_fluxanl
calculates the probability flux of a weighted ensemble simulation
based on a pre-defined target state. Also calculates confidence interval of
average flux. Monte Carlo bootstrapping techniques are used to account for
autocorrelation between fluxes and/or errors that are not normally distributed.
Overview
usage:
w_fluxanl [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE] [-o OUTPUT]
[--first-iter N_ITER] [--last-iter N_ITER]
[-a ALPHA] [--autocorrel-alpha ACALPHA] [-N NSETS] [--evol] [--evol-step ESTEP]
Note: All command line arguments are optional for w_fluxanl
.
Command-Line Options
See the general command-line tool reference for more information on the general options.
Input/output options
These arguments allow the user to specify where to read input simulation result data and where to output calculated progress coordinate probability distribution data.
Both input and output files are hdf5 format.:
-W, --west-data file
Read simulation result data from file *file*. (**Default:** The
*hdf5* file specified in the configuration file)
-o, --output file
Store this tool's output in *file*. (**Default:** The *hdf5* file
**pcpdist.h5**)
Iteration range options
Specify the range of iterations over which to construct the progress coordinate probability distribution.:
--first-iter n_iter
Construct probability distribution starting with iteration *n_iter*
(**Default:** 1)
--last-iter n_iter
Construct probability distribution's time evolution up to (and
including) iteration *n_iter* (**Default:** Last completed
iteration)
Confidence interval and bootstrapping options
Specify alpha values of constructed confidence intervals.:
-a alpha
Calculate a (1 - *alpha*) confidence interval for the mean flux
(**Default:** 0.05)
--autocorrel-alpha ACalpha
Identify autocorrelation of fluxes at *ACalpha* significance level.
Note: Specifying an *ACalpha* level that is too small may result in
failure to find autocorrelation in noisy flux signals (**Default:**
Same level as *alpha*)
-N n_sets, --nsets n_sets
Use *n_sets* samples for bootstrapping (**Default:** Chosen based
on *alpha*)
--evol
Calculate the time evolution of flux confidence intervals
(**Warning:** computationally expensive calculation)
--evol-step estep
(if ``'--evol'`` specified) Calculate the time evolution of flux
confidence intervals for every *estep* iterations (**Default:** 1)
Examples
Calculate the time evolution flux every 5 iterations:
w_fluxanl --evol --evol-step 5
Calculate mean flux confidence intervals at 0.01 signicance level and calculate autocorrelations at 0.05 significance:
w_fluxanl --alpha 0.01 --autocorrel-alpha 0.05
Calculate the mean flux confidence intervals using a custom bootstrap sample size of 500:
w_fluxanl --n-sets 500
westpa.cli.tools.w_fluxanl module
- westpa.cli.tools.w_fluxanl.fftconvolve(in1, in2, mode='full', axes=None)
Convolve two N-dimensional arrays using FFT.
Convolve in1 and in2 using the fast Fourier transform method, with the output size determined by the mode argument.
This is generally much faster than convolve for large arrays (n > ~500), but can be slower when only a few output values are needed, and can only output float arrays (int or object array inputs will be cast to float).
As of v0.19, convolve automatically chooses this method or the direct method based on an estimation of which is faster.
- Parameters:
in1 (array_like) – First input.
in2 (array_like) – Second input. Should have the same number of dimensions as in1.
mode (str {'full', 'valid', 'same'}, optional) –
A string indicating the size of the output:
full
The output is the full discrete linear convolution of the inputs. (Default)
valid
The output consists only of those elements that do not rely on the zero-padding. In ‘valid’ mode, either in1 or in2 must be at least as large as the other in every dimension.
same
The output is the same size as in1, centered with respect to the ‘full’ output.
axes (int or array_like of ints or None, optional) – Axes over which to compute the convolution. The default is over all axes.
- Returns:
out – An N-dimensional array containing a subset of the discrete linear convolution of in1 with in2.
- Return type:
array
See also
convolve
Uses the direct convolution or FFT convolution algorithm depending on which is faster.
oaconvolve
Uses the overlap-add method to do convolution, which is generally faster when the input arrays are large and significantly different in size.
Examples
Autocorrelation of white noise is an impulse.
>>> import numpy as np >>> from scipy import signal >>> rng = np.random.default_rng() >>> sig = rng.standard_normal(1000) >>> autocorr = signal.fftconvolve(sig, sig[::-1], mode='full')
>>> import matplotlib.pyplot as plt >>> fig, (ax_orig, ax_mag) = plt.subplots(2, 1) >>> ax_orig.plot(sig) >>> ax_orig.set_title('White noise') >>> ax_mag.plot(np.arange(-len(sig)+1,len(sig)), autocorr) >>> ax_mag.set_title('Autocorrelation') >>> fig.tight_layout() >>> fig.show()
Gaussian blur implemented using FFT convolution. Notice the dark borders around the image, due to the zero-padding beyond its boundaries. The convolve2d function allows for other types of image boundaries, but is far slower.
>>> from scipy import datasets >>> face = datasets.face(gray=True) >>> kernel = np.outer(signal.windows.gaussian(70, 8), ... signal.windows.gaussian(70, 8)) >>> blurred = signal.fftconvolve(face, kernel, mode='same')
>>> fig, (ax_orig, ax_kernel, ax_blurred) = plt.subplots(3, 1, ... figsize=(6, 15)) >>> ax_orig.imshow(face, cmap='gray') >>> ax_orig.set_title('Original') >>> ax_orig.set_axis_off() >>> ax_kernel.imshow(kernel, cmap='gray') >>> ax_kernel.set_title('Gaussian kernel') >>> ax_kernel.set_axis_off() >>> ax_blurred.imshow(blurred, cmap='gray') >>> ax_blurred.set_title('Blurred') >>> ax_blurred.set_axis_off() >>> fig.show()
- westpa.cli.tools.w_fluxanl.warn()
Issue a warning, or maybe ignore it or raise an exception.
- message
Text of the warning message.
- category
The Warning category subclass. Defaults to UserWarning.
- stacklevel
How far up the call stack to make this warning appear. A value of 2 for example attributes the warning to the caller of the code calling warn().
- source
If supplied, the destroyed object which emitted a ResourceWarning
- skip_file_prefixes
An optional tuple of module filename prefixes indicating frames to skip during stacklevel computations for stack frame attribution.
- westpa.cli.tools.w_fluxanl.weight_dtype
alias of
float64
- westpa.cli.tools.w_fluxanl.n_iter_dtype
alias of
uint32
- class westpa.cli.tools.w_fluxanl.NewWeightEntry(source_type, weight, prev_seg_id=None, prev_init_pcoord=None, prev_final_pcoord=None, new_init_pcoord=None, target_state_id=None, initial_state_id=None)
Bases:
object
- NW_SOURCE_RECYCLED = 0
- class westpa.cli.tools.w_fluxanl.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- class westpa.cli.tools.w_fluxanl.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_fluxanl.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- westpa.cli.tools.w_fluxanl.extract_fluxes(iter_start=None, iter_stop=None, data_manager=None)
Extract flux values from the WEST HDF5 file for iterations >= iter_start and < iter_stop, optionally using another data manager instance instead of the global one returned by
westpa.rc.get_data_manager()
.Returns a dictionary mapping target names (if available, target index otherwise) to a 1-D array of type
fluxentry_dtype
, which contains columns for iteration number, flux, and count.
- class westpa.cli.tools.w_fluxanl.WFluxanlTool
Bases:
WESTTool
- prog = 'w_fluxanl'
- description = 'Extract fluxes into pre-defined target states from WEST data,\naverage, and construct confidence intervals. Monte Carlo bootstrapping\nis used to account for the correlated and possibly non-Gaussian statistical\nerror in flux measurements.\n\nAll non-graphical output (including that to the terminal and HDF5) assumes that\nthe propagation/resampling period ``tau`` is equal to unity; to obtain results\nin familiar units, divide all fluxes and multiply all correlation lengths by\nthe true value of ``tau``.\n'
- output_format_version = 2
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- calc_store_flux_data()
- calc_evol_flux()
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_fluxanl.entry_point()
westpa.core package
westpa.core.binning package
westpa.core.binning module
- class westpa.core.binning.NopMapper
Bases:
BinMapper
Put everything into one bin.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.FuncBinMapper(func, nbins, args=None, kwargs=None)
Bases:
BinMapper
Binning using a custom function which must iterate over input coordinate sets itself.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.PiecewiseBinMapper(functions)
Bases:
BinMapper
Binning using a set of functions returing boolean values; if the Nth function returns True for a coordinate tuple, then that coordinate is in the Nth bin.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.RectilinearBinMapper(boundaries)
Bases:
BinMapper
Bin into a rectangular grid based on tuples of float values
- property boundaries
- assign(coords, mask=None, output=None)
- class westpa.core.binning.RecursiveBinMapper(base_mapper, start_index=0)
Bases:
BinMapper
Nest mappers one within another.
- property labels
- property start_index
- add_mapper(mapper, replaces_bin_at)
Replace the bin containing the coordinate tuple
replaces_bin_at
with the specifiedmapper
.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.VectorizingFuncBinMapper(func, nbins, args=None, kwargs=None)
Bases:
BinMapper
Binning using a custom function which is evaluated once for each (unmasked) coordinate tuple provided.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.VoronoiBinMapper(dfunc, centers, dfargs=None, dfkwargs=None)
Bases:
BinMapper
A one-dimensional mapper which assigns a multidimensional pcoord to the closest center based on a distance metric. Both the list of centers and the distance function must be supplied.
- assign(coords, mask=None, output=None)
- westpa.core.binning.map_mab(coords, mask, output, *args, **kwargs)
Binning which adaptively places bins based on the positions of extrema segments and bottleneck segments, which are where the difference in probability is the greatest along the progress coordinate. Operates per dimension and places a fixed number of evenly spaced bins between the segments with the min and max pcoord values. Extrema and bottleneck segments are assigned their own bins.
- Parameters:
coords (ndarray) – An array with pcoord and weight info.
mask (ndarray) – Array of 1 (True) and 0 (False), to filter out unwanted segment info.
output (list) – The main list that, for each segment, holds the bin assignment.
*args (list) – Variable length arguments.
**kwargs (dict) – Arbitary keyword arguments. Contains most of the MAB-needed parameters.
- Returns:
output – The main list that, for each segment, holds the bin assignment.
- Return type:
list
- westpa.core.binning.map_binless(coords, mask, output, *args, **kwargs)
Adaptively groups walkers according to a user-defined grouping function that is defined externally. Very general implementation but limited to only a two dimensional progress coordinate (for now).
- class westpa.core.binning.MABBinMapper(nbins, direction=None, skip=None, bottleneck=True, pca=False, mab_log=False, bin_log=False, bin_log_path='$WEST_SIM_ROOT/binbounds.log')
Bases:
FuncBinMapper
Adaptively place bins in between minimum and maximum segments along the progress coordinte. Extrema and bottleneck segments are assigned to their own bins.
- Parameters:
nbins (list of int) – List of int for nbins in each dimension.
direction (Union(list of int, None), default: None) –
List of int for ‘direction’ in each dimension. Direction options are as follows:
0 : default split at leading and lagging boundaries 1 : split at leading boundary only -1 : split at lagging boundary only 86 : no splitting at either leading or lagging boundary
skip (Union(list of int, None), default: None) – List of int for each dimension. Default None for skip=0. Set to 1 to ‘skip’ running mab in a dimension.
bottleneck (bool, default: True) – Whether to turn on or off bottleneck walker splitting.
pca (bool, default: False) – Can be True or False (default) to run PCA on pcoords before bin assignment.
mab_log (bool, default: False) – Whether to output mab info to west.log.
bin_log (bool, default: False) – Whether to output mab bin boundaries to bin_log_path file.
bin_log_path (str, default: "$WEST_SIM_ROOT/binbounds.log") – Path to output bin boundaries.
- determine_total_bins(nbins_per_dim, direction, skip, bottleneck, **kwargs)
The following is neccessary because functional bin mappers need to “reserve” bins and tell the sim manager how many bins they will need to use, this is determined by taking all direction/skipping info into account.
- Parameters:
nbins_per_dim (int) – Number of total bins in each direction.
direction (list of int) – Direction in each dimension. See __init__ for more information.
skip (list of int) – List of 0s and 1s indicating whether to skip each dimension.
bottleneck (bool) – Whether to include separate bin for bottleneck walker(s).
**kwargs (dict) – Arbitary keyword arguments. Contains unneeded MAB parameters.
- Returns:
n_total_bins – Number of total bins.
- Return type:
int
- class westpa.core.binning.BinlessMapper(ngroups, ndims, group_function, **group_function_kwargs)
Bases:
FuncBinMapper
Adaptively group walkers according to a user-defined grouping function that is defined externally.
- class westpa.core.binning.MABDriver(rc=None, system=None)
Bases:
WEDriver
- assign(segments, initializing=False)
Assign segments to initial and final bins, and update the (internal) lists of used and available initial states. This function is adapted to the MAB scheme, so that the inital and final segments are sent to the bin mapper at the same time, otherwise the inital and final bin boundaries can be inconsistent.
- class westpa.core.binning.MABSimManager(rc=None)
Bases:
WESimManager
Subclass of WESimManager, modifying it so bin assignments will be done after all segments are done propagating.
- initialize_simulation(basis_states, target_states, start_states, segs_per_state=1, suppress_we=False)
Making sure that that the MABBinMapper is not the outer bin.
- propagate()
- prepare_iteration()
- class westpa.core.binning.BinlessDriver(rc=None, system=None)
Bases:
WEDriver
- assign(segments, initializing=False)
Assign segments to initial and final bins, and update the (internal) lists of used and available initial states. This function is adapted to the MAB scheme, so that the inital and final segments are sent to the bin mapper at the same time, otherwise the inital and final bin boundaries can be inconsistent.
- class westpa.core.binning.BinlessSimManager(rc=None)
Bases:
WESimManager
- initialize_simulation(basis_states, target_states, start_states, segs_per_state=1, suppress_we=False)
Initialize a new weighted ensemble simulation, taking
segs_per_state
initial states from each of the givenbasis_states
.w_init
is the forward-facing version of this function
- propagate()
- prepare_iteration()
- westpa.core.binning.accumulate_labeled_populations(weights, bin_assignments, label_assignments, labeled_bin_pops)
For a set of segments in one iteration, calculate the average population in each bin, with separation by last-visited macrostate.
- westpa.core.binning.assign_and_label(nsegs_lb, nsegs_ub, parent_ids, assign, nstates, state_map, last_labels, pcoords, subsample)
Assign trajectories to bins and last-visted macrostates for each timepoint.
- westpa.core.binning.accumulate_state_populations_from_labeled(labeled_bin_pops, state_map, state_pops, check_state_map=True)
- westpa.core.binning.assignments_list_to_table(nsegs, nbins, assignments)
Convert a list of bin assignments (integers) to a boolean table indicating indicating if a given segment is in a given bin
- westpa.core.binning.coord_dtype
alias of
float32
- westpa.core.binning.index_dtype
alias of
uint16
westpa.core.binning.assign module
Bin assignment for WEST simulations. This module defines “bin mappers” which take vectors of coordinates (or rather, coordinate tuples), and assign each a definite integer value identifying a bin. Critical portions are implemented in a Cython extension module.
A number of pre-defined bin mappers are available here:
RectilinearBinMapper
, for bins divided by N-dimensional grids
FuncBinMapper
, for functions which directly calculate bin assignments for a number of coordinate values. This is best used with C/Cython/Numba functions, or intellegently-tuned numpy-based Python functions.
VectorizingFuncBinMapper
, for functions which calculate a bin assignment for a single coordinate value. This is best used for arbitrary Python functions.
PiecewiseBinMapper
, for using a set of boolean-valued functions, one per bin, to determine assignments. This is likely to be much slower than a FuncBinMapper or VectorizingFuncBinMapper equipped with an appropriate function, and its use is discouraged.
One “super-mapper” is available, for assembling more complex bin spaces from simpler components:
RecursiveBinMapper
, for nesting one set of bins within another.
Users are also free to implement their own mappers. A bin mapper must implement, at
least, an assign(coords, mask=None, output=None)
method, which is responsible
for mapping each of the vector of coordinate tuples coords
to an integer
(np.uint16) indicating a what bin that coordinate tuple falls into. The optional
mask
(a numpy bool array) specifies that some coordinates are to be skipped; this is used,
for instance, by the recursive (nested) bin mapper to minimize the number of calculations
required to definitively assign a coordinate tuple to a bin. Similarly, the optional
output
must be an integer (uint16) array of the same length as coords
, into which
assignments are written. The assign()
function must return a reference to output
.
(This is used to avoid allocating many temporary output arrays in complex binning
scenarios.)
A user-defined bin mapper must also make an nbins
property available, containing
the total number of bins within the mapper.
- class westpa.core.binning.assign.Bin(iterable=None, label=None)
Bases:
set
- property weight
Total weight of all walkers in this bin
- reweight(new_weight)
Reweight all walkers in this bin so that the total weight is new_weight
- westpa.core.binning.assign.output_map(output, omap, mask)
For each output for which mask is true, execute output[i] = omap[output[i]]
- westpa.core.binning.assign.apply_down(func, args, kwargs, coords, mask, output)
Apply func(coord, *args, **kwargs) to each input coordinate tuple, skipping any for which mask is false and writing results to output.
- westpa.core.binning.assign.apply_down_argmin_across(func, args, kwargs, func_output_len, coords, mask, output)
Apply func(coord, *args, **kwargs) to each input coordinate tuple, skipping any for which mask is false and writing results to output.
- westpa.core.binning.assign.rectilinear_assign(coords, mask, output, boundaries, boundlens)
For bins delimited by sets boundaries on a rectilinear grid (
boundaries
), assign coordinates to bins, assuming C ordering of indices within the grid.boundlens
is the number of boundaries in each dimension.
- westpa.core.binning.assign.index_dtype
alias of
uint16
- westpa.core.binning.assign.coord_dtype
alias of
float32
- class westpa.core.binning.assign.BinMapper
Bases:
object
- hashfunc(*, usedforsecurity=True)
Returns a sha256 hash object; optionally initialized with a string
- construct_bins(type_=<class 'westpa.core.binning.bins.Bin'>)
Construct and return an array of bins of type
type
- pickle_and_hash()
Pickle this mapper and calculate a hash of the result (thus identifying the contents of the pickled data), returning a tuple
(pickled_data, hash)
. This will raise PickleError if this mapper cannot be pickled, in which case code that would otherwise rely on detecting a topology change must assume a topology change happened, even if one did not.
- class westpa.core.binning.assign.NopMapper
Bases:
BinMapper
Put everything into one bin.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.assign.RectilinearBinMapper(boundaries)
Bases:
BinMapper
Bin into a rectangular grid based on tuples of float values
- property boundaries
- assign(coords, mask=None, output=None)
- class westpa.core.binning.assign.PiecewiseBinMapper(functions)
Bases:
BinMapper
Binning using a set of functions returing boolean values; if the Nth function returns True for a coordinate tuple, then that coordinate is in the Nth bin.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.assign.FuncBinMapper(func, nbins, args=None, kwargs=None)
Bases:
BinMapper
Binning using a custom function which must iterate over input coordinate sets itself.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.assign.VectorizingFuncBinMapper(func, nbins, args=None, kwargs=None)
Bases:
BinMapper
Binning using a custom function which is evaluated once for each (unmasked) coordinate tuple provided.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.assign.VoronoiBinMapper(dfunc, centers, dfargs=None, dfkwargs=None)
Bases:
BinMapper
A one-dimensional mapper which assigns a multidimensional pcoord to the closest center based on a distance metric. Both the list of centers and the distance function must be supplied.
- assign(coords, mask=None, output=None)
- class westpa.core.binning.assign.RecursiveBinMapper(base_mapper, start_index=0)
Bases:
BinMapper
Nest mappers one within another.
- property labels
- property start_index
- add_mapper(mapper, replaces_bin_at)
Replace the bin containing the coordinate tuple
replaces_bin_at
with the specifiedmapper
.
- assign(coords, mask=None, output=None)
westpa.core.binning.bins module
Minimal Adaptive Binning (MAB) Scheme
westpa.core.binning.mab module
- class westpa.core.binning.mab.FuncBinMapper(func, nbins, args=None, kwargs=None)
Bases:
BinMapper
Binning using a custom function which must iterate over input coordinate sets itself.
- assign(coords, mask=None, output=None)
- westpa.core.binning.mab.expandvars(path)
Expand shell variables of form $var and ${var}. Unknown variables are left unchanged.
- class westpa.core.binning.mab.MABBinMapper(nbins, direction=None, skip=None, bottleneck=True, pca=False, mab_log=False, bin_log=False, bin_log_path='$WEST_SIM_ROOT/binbounds.log')
Bases:
FuncBinMapper
Adaptively place bins in between minimum and maximum segments along the progress coordinte. Extrema and bottleneck segments are assigned to their own bins.
- Parameters:
nbins (list of int) – List of int for nbins in each dimension.
direction (Union(list of int, None), default: None) –
List of int for ‘direction’ in each dimension. Direction options are as follows:
0 : default split at leading and lagging boundaries 1 : split at leading boundary only -1 : split at lagging boundary only 86 : no splitting at either leading or lagging boundary
skip (Union(list of int, None), default: None) – List of int for each dimension. Default None for skip=0. Set to 1 to ‘skip’ running mab in a dimension.
bottleneck (bool, default: True) – Whether to turn on or off bottleneck walker splitting.
pca (bool, default: False) – Can be True or False (default) to run PCA on pcoords before bin assignment.
mab_log (bool, default: False) – Whether to output mab info to west.log.
bin_log (bool, default: False) – Whether to output mab bin boundaries to bin_log_path file.
bin_log_path (str, default: "$WEST_SIM_ROOT/binbounds.log") – Path to output bin boundaries.
- determine_total_bins(nbins_per_dim, direction, skip, bottleneck, **kwargs)
The following is neccessary because functional bin mappers need to “reserve” bins and tell the sim manager how many bins they will need to use, this is determined by taking all direction/skipping info into account.
- Parameters:
nbins_per_dim (int) – Number of total bins in each direction.
direction (list of int) – Direction in each dimension. See __init__ for more information.
skip (list of int) – List of 0s and 1s indicating whether to skip each dimension.
bottleneck (bool) – Whether to include separate bin for bottleneck walker(s).
**kwargs (dict) – Arbitary keyword arguments. Contains unneeded MAB parameters.
- Returns:
n_total_bins – Number of total bins.
- Return type:
int
- westpa.core.binning.mab.map_mab(coords, mask, output, *args, **kwargs)
Binning which adaptively places bins based on the positions of extrema segments and bottleneck segments, which are where the difference in probability is the greatest along the progress coordinate. Operates per dimension and places a fixed number of evenly spaced bins between the segments with the min and max pcoord values. Extrema and bottleneck segments are assigned their own bins.
- Parameters:
coords (ndarray) – An array with pcoord and weight info.
mask (ndarray) – Array of 1 (True) and 0 (False), to filter out unwanted segment info.
output (list) – The main list that, for each segment, holds the bin assignment.
*args (list) – Variable length arguments.
**kwargs (dict) – Arbitary keyword arguments. Contains most of the MAB-needed parameters.
- Returns:
output – The main list that, for each segment, holds the bin assignment.
- Return type:
list
westpa.core.binning.mab_driver
- class westpa.core.binning.mab_driver.WEDriver(rc=None, system=None)
Bases:
object
A class implemented Huber & Kim’s weighted ensemble algorithm over Segment objects. This class handles all binning, recycling, and preparation of new Segment objects for the next iteration. Binning is accomplished using system.bin_mapper, and per-bin target counts are from system.bin_target_counts.
The workflow is as follows:
Call new_iteration() every new iteration, providing any recycling targets that are in force and any available initial states for recycling.
Call assign() to assign segments to bins based on their initial and end points. This returns the number of walkers that were recycled.
Call run_we(), optionally providing a set of initial states that will be used to recycle walkers.
Note the presence of flux_matrix, transition_matrix, current_iter_segments, next_iter_segments, recycling_segments, initial_binning, final_binning, next_iter_binning, and new_weights (to be documented soon).
- weight_split_threshold = 2.0
- weight_merge_cutoff = 1.0
- largest_allowed_weight = 1.0
- smallest_allowed_weight = 1e-310
- process_config()
- property next_iter_segments
Newly-created segments for the next iteration
- property current_iter_segments
Segments for the current iteration
- property next_iter_assignments
Bin assignments (indices) for initial points of next iteration.
- property current_iter_assignments
Bin assignments (indices) for endpoints of current iteration.
- property recycling_segments
Segments designated for recycling
- property n_recycled_segs
Number of segments recycled this iteration
- property n_istates_needed
Number of initial states needed to support recycling for this iteration
- check_threshold_configs()
Check to see if weight thresholds parameters are valid
- clear()
Explicitly delete all Segment-related state.
- new_iteration(initial_states=None, target_states=None, new_weights=None, bin_mapper=None, bin_target_counts=None)
Prepare for a new iteration.
initial_states
is a sequence of all InitialState objects valid for use in to generating new segments for the next iteration (after the one being begun with the call to new_iteration); that is, these are states available to recycle to. Target states which generate recycling events are specified intarget_states
, a sequence of TargetState objects. Bothinitial_states
andtarget_states
may be empty as required.The optional
new_weights
is a sequence of NewWeightEntry objects which will be used to construct the initial flux matrix.The given
bin_mapper
will be used for assignment, andbin_target_counts
used for splitting/merging target counts; each will be obtained from the system object if omitted or None.
- add_initial_states(initial_states)
Add newly-prepared initial states to the pool available for recycling.
- property all_initial_states
Return an iterator over all initial states (available or used)
- assign(segments, initializing=False)
Assign segments to initial and final bins, and update the (internal) lists of used and available initial states. If
initializing
is True, then the “final” bin assignments will be identical to the initial bin assignments, a condition required for seeding a new iteration from pre-existing segments.
- populate_initial(initial_states, weights, system=None)
Create walkers for a new weighted ensemble simulation.
One segment is created for each provided initial state, then binned and split/merged as necessary. After this function is called, next_iter_segments will yield the new segments to create, used_initial_states will contain data about which of the provided initial states were used, and avail_initial_states will contain data about which initial states were unused (because their corresponding walkers were merged out of existence).
- rebin_current(parent_segments)
Reconstruct walkers for the current iteration based on (presumably) new binning. The previous iteration’s segments must be provided (as
parent_segments
) in order to update endpoint types appropriately.
- construct_next()
Construct walkers for the next iteration, by running weighted ensemble recycling and bin/split/merge on the segments previously assigned to bins using
assign
. Enough unused initial states must be present inself.avail_initial_states
for every recycled walker to be assigned an initial state.After this function completes,
self.flux_matrix
contains a valid flux matrix for this iteration (including any contributions from recycling from the previous iteration), andself.next_iter_segments
contains a list of segments ready for the next iteration, with appropriate values set for weight, endpoint type, parent walkers, and so on.
- class westpa.core.binning.mab_driver.MABDriver(rc=None, system=None)
Bases:
WEDriver
- assign(segments, initializing=False)
Assign segments to initial and final bins, and update the (internal) lists of used and available initial states. This function is adapted to the MAB scheme, so that the inital and final segments are sent to the bin mapper at the same time, otherwise the inital and final bin boundaries can be inconsistent.
westpa.core.binning.mab_manager
- class westpa.core.binning.mab_manager.MABBinMapper(nbins, direction=None, skip=None, bottleneck=True, pca=False, mab_log=False, bin_log=False, bin_log_path='$WEST_SIM_ROOT/binbounds.log')
Bases:
FuncBinMapper
Adaptively place bins in between minimum and maximum segments along the progress coordinte. Extrema and bottleneck segments are assigned to their own bins.
- Parameters:
nbins (list of int) – List of int for nbins in each dimension.
direction (Union(list of int, None), default: None) –
List of int for ‘direction’ in each dimension. Direction options are as follows:
0 : default split at leading and lagging boundaries 1 : split at leading boundary only -1 : split at lagging boundary only 86 : no splitting at either leading or lagging boundary
skip (Union(list of int, None), default: None) – List of int for each dimension. Default None for skip=0. Set to 1 to ‘skip’ running mab in a dimension.
bottleneck (bool, default: True) – Whether to turn on or off bottleneck walker splitting.
pca (bool, default: False) – Can be True or False (default) to run PCA on pcoords before bin assignment.
mab_log (bool, default: False) – Whether to output mab info to west.log.
bin_log (bool, default: False) – Whether to output mab bin boundaries to bin_log_path file.
bin_log_path (str, default: "$WEST_SIM_ROOT/binbounds.log") – Path to output bin boundaries.
- determine_total_bins(nbins_per_dim, direction, skip, bottleneck, **kwargs)
The following is neccessary because functional bin mappers need to “reserve” bins and tell the sim manager how many bins they will need to use, this is determined by taking all direction/skipping info into account.
- Parameters:
nbins_per_dim (int) – Number of total bins in each direction.
direction (list of int) – Direction in each dimension. See __init__ for more information.
skip (list of int) – List of 0s and 1s indicating whether to skip each dimension.
bottleneck (bool) – Whether to include separate bin for bottleneck walker(s).
**kwargs (dict) – Arbitary keyword arguments. Contains unneeded MAB parameters.
- Returns:
n_total_bins – Number of total bins.
- Return type:
int
- class westpa.core.binning.mab_manager.WESimManager(rc=None)
Bases:
object
- process_config()
- register_callback(hook, function, priority=0)
Registers a callback to execute during the given
hook
into the simulation loop. The optional priority is used to order when the function is called relative to other registered callbacks.
- invoke_callbacks(hook, *args, **kwargs)
- load_plugins(plugins=None)
- report_bin_statistics(bins, target_states, save_summary=False)
- get_bstate_pcoords(basis_states, label='basis')
For each of the given
basis_states
, calculate progress coordinate values as necessary. The HDF5 file is not updated.
- report_basis_states(basis_states, label='basis')
- report_target_states(target_states)
- initialize_simulation(basis_states, target_states, start_states, segs_per_state=1, suppress_we=False)
Initialize a new weighted ensemble simulation, taking
segs_per_state
initial states from each of the givenbasis_states
.w_init
is the forward-facing version of this function
- prepare_iteration()
- finalize_iteration()
Clean up after an iteration and prepare for the next.
- get_istate_futures()
Add
n_states
initial states to the internal list of initial states assigned to recycled particles. Spare states are used if available, otherwise new states are created. If created new initial states requires generation, then a set of futures is returned representing work manager tasks corresponding to the necessary generation work.
- propagate()
- save_bin_data()
Calculate and write flux and transition count matrices to HDF5. Population and rate matrices are likely useless at the single-tau level and are no longer written.
- check_propagation()
Check for failures in propagation or initial state generation, and raise an exception if any are found.
- run_we()
Run the weighted ensemble algorithm based on the binning in self.final_bins and the recycled particles in self.to_recycle, creating and committing the next iteration’s segments to storage as well.
- prepare_new_iteration()
Commit data for the coming iteration to the HDF5 file.
- run()
- prepare_run()
Prepare a new run.
- finalize_run()
Perform cleanup at the normal end of a run
- pre_propagation()
- post_propagation()
- pre_we()
- post_we()
- westpa.core.binning.mab_manager.grouper(n, iterable, fillvalue=None)
Collect data into fixed-length chunks or blocks
- class westpa.core.binning.mab_manager.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- westpa.core.binning.mab_manager.pare_basis_initial_states(basis_states, initial_states, segments=None)
Given iterables of basis and initial states (and optionally segments that use them), return minimal sets (as in __builtins__.set) of states needed to describe the history of the given segments an initial states.
- class westpa.core.binning.mab_manager.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.core.binning.mab_manager.MABSimManager(rc=None)
Bases:
WESimManager
Subclass of WESimManager, modifying it so bin assignments will be done after all segments are done propagating.
- initialize_simulation(basis_states, target_states, start_states, segs_per_state=1, suppress_we=False)
Making sure that that the MABBinMapper is not the outer bin.
- propagate()
- prepare_iteration()
westpa.core.kinetics package
westpa.core.kinetics module
Kinetics analysis library
- class westpa.core.kinetics.RateAverager(bin_mapper, system=None, data_manager=None, work_manager=None)
Bases:
object
Calculate bin-to-bin kinetic properties (fluxes, rates, populations) at 1-tau resolution
- extract_data(iter_indices)
Extract data from the data_manger and place in dict mirroring the same underlying layout.
- task_generator(iter_start, iter_stop, block_size)
- calculate(iter_start=None, iter_stop=None, n_blocks=1, queue_size=1)
Read the HDF5 file and collect flux matrices and population vectors for each bin for each iteration in the range [iter_start, iter_stop). Break the calculation into n_blocks blocks. If the calculation is broken up into more than one block, queue_size specifies the maxmimum number of tasks in the work queue.
- westpa.core.kinetics.calculate_labeled_fluxes(nstates, weights, parent_ids, micro_assignments, traj_assignments, fluxes)
- westpa.core.kinetics.labeled_flux_to_rate(labeled_fluxes, labeled_pops, output=None)
Convert a labeled flux matrix and corresponding labeled bin populations to a labeled rate matrix.
- westpa.core.kinetics.calculate_labeled_fluxes_alllags(nstates, weights, parent_ids, micro_assignments, traj_assignments, fluxes)
- westpa.core.kinetics.nested_to_flat_matrix(input)
Convert nested flux/rate matrix into a flat supermatrix.
- westpa.core.kinetics.nested_to_flat_vector(input)
Convert nested labeled population vector into a flat vector.
- westpa.core.kinetics.flat_to_nested_matrix(nstates, nbins, input)
Convert flat supermatrix into nested matrix.
- westpa.core.kinetics.flat_to_nested_vector(nstates, nbins, input)
Convert flat “supervector” into nested vector.
- westpa.core.kinetics.find_macrostate_transitions(nstates, weights, label_assignments, state_assignments, dt, state, macro_fluxes, macro_counts, target_fluxes, target_counts, durations)
- westpa.core.kinetics.sequence_macro_flux_to_rate(dataset, pops, istate, jstate, pairwise=True, stride=None)
Convert a sequence of macrostate fluxes and corresponding list of trajectory ensemble populations to a sequence of rate matrices.
If the optional
pairwise
is true (the default), then rates are normalized according to the relative probability of the initial state among the pair of states (initial, final); this is probably what you want, as these rates will then depend only on the definitions of the states involved (and never the remaining states). Otherwise (``pairwise’’ is false), the rates are normalized according the probability of the initial state among all other states.
westpa.core.kinetics.events module
- westpa.core.kinetics.events.weight_dtype
alias of
float64
- westpa.core.kinetics.events.index_dtype
alias of
uint16
- westpa.core.kinetics.events.find_macrostate_transitions(nstates, weights, label_assignments, state_assignments, dt, state, macro_fluxes, macro_counts, target_fluxes, target_counts, durations)
westpa.core.kinetics.matrates module
Routines for implementing Letteri et al.’s macrostate-to-macrostate rate calculations using extrapolation to steady-state populations from average rate matrices
Internally, “labeled” objects (bin populations labeled by history, rate matrix elements labeled by history) are stored as nested arrays – e.g. rates[initial_label, final_label, initial_bin, final_bin]. These are converted to the flat forms required for, say, eigenvalue calculations internally, and the results converted back. This is because these conversions are not expensive, and saves users of this code from having to know how the flattened indexing works (something I screwed up all too easily during development) – mcz
- westpa.core.kinetics.matrates.weight_dtype
alias of
float64
- westpa.core.kinetics.matrates.calculate_labeled_fluxes(nstates, weights, parent_ids, micro_assignments, traj_assignments, fluxes)
- westpa.core.kinetics.matrates.calculate_labeled_fluxes_alllags(nstates, weights, parent_ids, micro_assignments, traj_assignments, fluxes)
- westpa.core.kinetics.matrates.labeled_flux_to_rate(labeled_fluxes, labeled_pops, output=None)
Convert a labeled flux matrix and corresponding labeled bin populations to a labeled rate matrix.
- westpa.core.kinetics.matrates.nested_to_flat_matrix(input)
Convert nested flux/rate matrix into a flat supermatrix.
- westpa.core.kinetics.matrates.nested_to_flat_vector(input)
Convert nested labeled population vector into a flat vector.
- westpa.core.kinetics.matrates.flat_to_nested_vector(nstates, nbins, input)
Convert flat “supervector” into nested vector.
- exception westpa.core.kinetics.matrates.ConsistencyWarning
Bases:
UserWarning
- westpa.core.kinetics.matrates.get_steady_state(rates)
Get steady state solution for a rate matrix. As an optimization, returns the flattened labeled population vector (of length nstates*nbins); to convert to the nested vector used for storage, use nested_to_flat_vector().
- westpa.core.kinetics.matrates.get_macrostate_rates(labeled_rates, labeled_pops, extrapolate=True)
Using a labeled rate matrix and labeled bin populations, calculate the steady state probability distribution and consequent state-to-state rates.
Returns
(ss, macro_rates)
, wheress
is the steady-state probability distribution andmacro_rates
is the state-to-state rate matrix.
- westpa.core.kinetics.matrates.estimate_rates(nbins, state_labels, weights, parent_ids, bin_assignments, label_assignments, state_map, labeled_pops, all_lags=False, labeled_fluxes=None, labeled_rates=None, unlabeled_rates=None)
Estimate fluxes and rates over multiple iterations. The number of iterations is determined by how many vectors of weights, parent IDs, bin assignments, and label assignments are passed.
If
all_lags
is true, then the average is over all possible lags within the length-N window given, otherwise simply the length N lag.Returns labeled flux matrix, labeled rate matrix, and unlabeled rate matrix.
westpa.core.kinetics.rate_averaging module
- westpa.core.kinetics.rate_averaging.namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)
Returns a new subclass of tuple with named fields.
>>> Point = namedtuple('Point', ['x', 'y']) >>> Point.__doc__ # docstring for the new class 'Point(x, y)' >>> p = Point(11, y=22) # instantiate with positional args or keywords >>> p[0] + p[1] # indexable like a plain tuple 33 >>> x, y = p # unpack like a regular tuple >>> x, y (11, 22) >>> p.x + p.y # fields also accessible by name 33 >>> d = p._asdict() # convert to a dictionary >>> d['x'] 11 >>> Point(**d) # convert from a dictionary Point(x=11, y=22) >>> p._replace(x=100) # _replace() is like str.replace() but targets named fields Point(x=100, y=22)
- class westpa.core.kinetics.rate_averaging.zip_longest
Bases:
object
zip_longest(iter1 [,iter2 […]], [fillvalue=None]) –> zip_longest object
Return a zip_longest object whose .__next__() method returns a tuple where the i-th element comes from the i-th iterable argument. The .__next__() method continues until the longest iterable in the argument sequence is exhausted and then it raises StopIteration. When the shorter iterables are exhausted, the fillvalue is substituted in their place. The fillvalue defaults to None or can be specified by a keyword argument.
- westpa.core.kinetics.rate_averaging.flux_assign(weights, init_assignments, final_assignments, flux_matrix)
- westpa.core.kinetics.rate_averaging.pop_assign(weights, assignments, populations)
- westpa.core.kinetics.rate_averaging.calc_rates(fluxes, populations, rates, mask)
Calculate a rate matrices from flux and population matrices. A matrix of the same shape as fluxes, is also produced, to be used for generating a mask for the rate matrices where initial state populations are zero.
- class westpa.core.kinetics.rate_averaging.StreamingStats1D
Bases:
object
Calculate mean and variance of a series of one-dimensional arrays of shape (nbins,) using an online algorithm. The statistics are accumulated along what would be axis=0 if the input arrays were stacked vertically.
This code has been adapted from: http://www.johndcook.com/skewness_kurtosis.html
- M1
- M2
- mean
- n
- update(x, mask)
Update the running set of statistics given
- Parameters:
x (1d ndarray) – values from a single observation
mask (1d ndarray) – A uint8 array to exclude entries from the accumulated statistics.
- var
- class westpa.core.kinetics.rate_averaging.StreamingStats2D
Bases:
object
Calculate mean and variance of a series of two-dimensional arrays of shape (nbins, nbins) using an online algorithm. The statistics are accumulated along what would be axis=0 if the input arrays were stacked vertically.
This code has been adapted from: http://www.johndcook.com/skewness_kurtosis.html
- M1
- M2
- mean
- n
- update(x, mask)
Update the running set of statistics given
- Parameters:
x (2d ndarray) – values from a single observation
mask (2d ndarray) – A uint8 array to exclude entries from the accumulated statistics.
- var
- class westpa.core.kinetics.rate_averaging.StreamingStatsTuple(M1, M2, n)
Bases:
tuple
Create new instance of StreamingStatsTuple(M1, M2, n)
- M1
Alias for field number 0
- M2
Alias for field number 1
- n
Alias for field number 2
- westpa.core.kinetics.rate_averaging.grouper(n, iterable, fillvalue=None)
Collect data into fixed-length chunks or blocks
- westpa.core.kinetics.rate_averaging.tuple2stats(stat_tuple)
- westpa.core.kinetics.rate_averaging.process_iter_chunk(bin_mapper, iter_indices, iter_data=None)
Calculate the flux matrices and populations of a set of iterations specified by iter_indices. Optionally provide the necessary arrays to perform the calculation in iter_data. Otherwise get data from the data_manager directly.
- class westpa.core.kinetics.rate_averaging.RateAverager(bin_mapper, system=None, data_manager=None, work_manager=None)
Bases:
object
Calculate bin-to-bin kinetic properties (fluxes, rates, populations) at 1-tau resolution
- extract_data(iter_indices)
Extract data from the data_manger and place in dict mirroring the same underlying layout.
- task_generator(iter_start, iter_stop, block_size)
- calculate(iter_start=None, iter_stop=None, n_blocks=1, queue_size=1)
Read the HDF5 file and collect flux matrices and population vectors for each bin for each iteration in the range [iter_start, iter_stop). Break the calculation into n_blocks blocks. If the calculation is broken up into more than one block, queue_size specifies the maxmimum number of tasks in the work queue.
westpa.core.propagators package
westpa.core.propagators module
- westpa.core.propagators.blocked_iter(blocksize, iterable, fillvalue=None)
- class westpa.core.propagators.WESTPropagator(rc=None)
Bases:
object
- prepare_iteration(n_iter, segments)
Perform any necessary per-iteration preparation. This is run by the work manager.
- finalize_iteration(n_iter, segments)
Perform any necessary post-iteration cleanup. This is run by the work manager.
- get_pcoord(state)
Get the progress coordinate of the given basis or initial state.
- gen_istate(basis_state, initial_state)
Generate a new initial state from the given basis state.
- propagate(segments)
Propagate one or more segments, including any necessary per-iteration setup and teardown for this propagator.
- clear_basis_initial_states()
- update_basis_initial_states(basis_states, initial_states)
westpa.core.propagators.executable module
- class westpa.core.propagators.executable.BytesIO(initial_bytes=b'')
Bases:
_BufferedIOBase
Buffered I/O implementation using an in-memory bytes buffer.
- close()
Disable all I/O operations.
- closed
True if the file is closed.
- flush()
Does nothing.
- getbuffer()
Get a read-write view over the contents of the BytesIO object.
- getvalue()
Retrieve the entire contents of the BytesIO object.
- isatty()
Always returns False.
BytesIO objects are not connected to a TTY-like device.
- read(size=-1, /)
Read at most size bytes, returned as a bytes object.
If the size argument is negative, read until EOF is reached. Return an empty bytes object at EOF.
- read1(size=-1, /)
Read at most size bytes, returned as a bytes object.
If the size argument is negative or omitted, read until EOF is reached. Return an empty bytes object at EOF.
- readable()
Returns True if the IO object can be read.
- readinto(buffer, /)
Read bytes into buffer.
Returns number of bytes read (0 for EOF), or None if the object is set not to block and has no data to read.
- readline(size=-1, /)
Next line from the file, as a bytes object.
Retain newline. A non-negative size argument limits the maximum number of bytes to return (an incomplete line may be returned then). Return an empty bytes object at EOF.
- readlines(size=None, /)
List of bytes objects, each a line from the file.
Call readline() repeatedly and return a list of the lines so read. The optional size argument, if given, is an approximate bound on the total number of bytes in the lines returned.
- seek(pos, whence=0, /)
Change stream position.
- Seek to byte offset pos relative to position indicated by whence:
0 Start of stream (the default). pos should be >= 0; 1 Current position - pos may be negative; 2 End of stream - pos usually negative.
Returns the new absolute position.
- seekable()
Returns True if the IO object can be seeked.
- tell()
Current file position, an integer.
- truncate(size=None, /)
Truncate the file to at most size bytes.
Size defaults to the current file position, as returned by tell(). The current file position is unchanged. Returns the new size.
- writable()
Returns True if the IO object can be written.
- write(b, /)
Write bytes to file.
Return the number of bytes written.
- writelines(lines, /)
Write lines to the file.
Note that newlines are not added. lines can be any iterable object producing bytes-like objects. This is equivalent to calling write() for each element.
- westpa.core.propagators.executable.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- class westpa.core.propagators.executable.WESTPropagator(rc=None)
Bases:
object
- prepare_iteration(n_iter, segments)
Perform any necessary per-iteration preparation. This is run by the work manager.
- finalize_iteration(n_iter, segments)
Perform any necessary post-iteration cleanup. This is run by the work manager.
- get_pcoord(state)
Get the progress coordinate of the given basis or initial state.
- gen_istate(basis_state, initial_state)
Generate a new initial state from the given basis state.
- propagate(segments)
Propagate one or more segments, including any necessary per-iteration setup and teardown for this propagator.
- clear_basis_initial_states()
- update_basis_initial_states(basis_states, initial_states)
- class westpa.core.propagators.executable.BasisState(label, probability, pcoord=None, auxref=None, state_id=None)
Bases:
object
Describes an basis (micro)state. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation (i.e. at w_init) or due to recycling.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
probability – Probability of this state to be selected when creating a new trajectory.
pcoord – The representative progress coordinate of this state.
auxref – A user-provided (string) reference for locating data associated with this state (usually a filesystem path).
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile)
Read a file defining basis states. Each line defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in:
unbound 1.0
or:
unbound_0 0.6 state0.pdb unbound_1 0.4 state1.pdb
- as_numpy_record()
Return the data for this state as a numpy record array.
- class westpa.core.propagators.executable.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- westpa.core.propagators.executable.return_state_type(state_obj)
Convinience function for returning the state ID and type of the state_obj pointer
- class westpa.core.propagators.executable.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- westpa.core.propagators.executable.check_bool(value, action='warn')
Check that the given
value
is boolean in type. If not, either raise a warning (ifaction=='warn'
) or an exception (action=='raise'
).
- westpa.core.propagators.executable.load_trajectory(folder)
Load trajectory from
folder
usingmdtraj
and return amdtraj.Trajectory
object. The folder should contain a trajectory and a topology file (with a recognizable extension) that is supported bymdtraj
. The topology file is optional if the trajectory file contains topology data (e.g., HDF5 format).
- westpa.core.propagators.executable.safe_extract(tar, path='.', members=None, *, numeric_owner=False)
- westpa.core.propagators.executable.pcoord_loader(fieldname, pcoord_return_filename, destobj, single_point)
Read progress coordinate data into the
pcoord
field ondestobj
. An exception will be raised if the data is malformed. Ifsingle_point
is true, then only one (N-dimensional) point will be read, otherwise system.pcoord_len points will be read.
- westpa.core.propagators.executable.aux_data_loader(fieldname, data_filename, segment, single_point)
- westpa.core.propagators.executable.npy_data_loader(fieldname, coord_file, segment, single_point)
- westpa.core.propagators.executable.pickle_data_loader(fieldname, coord_file, segment, single_point)
- westpa.core.propagators.executable.trajectory_loader(fieldname, coord_folder, segment, single_point)
Load data from the trajectory return.
coord_folder
should be the path to a folder containing trajectory files.segment
is theSegment
object that the data is associated with. Please seeload_trajectory
for more details.single_point
is not used by this loader.
- westpa.core.propagators.executable.restart_loader(fieldname, restart_folder, segment, single_point)
Load data from the restart return. The loader will tar all files in
restart_folder
and store it in the per-iteration HDF5 file.segment
is theSegment
object that the data is associated with.single_point
is not used by this loader.
- westpa.core.propagators.executable.restart_writer(path, segment)
Prepare the necessary files from the per-iteration HDF5 file to run
segment
.
- westpa.core.propagators.executable.seglog_loader(fieldname, log_file, segment, single_point)
Load data from the log return. The loader will tar all files in
log_file
and store it in the per-iteration HDF5 file.segment
is theSegment
object that the data is associated with.single_point
is not used by this loader.
- class westpa.core.propagators.executable.ExecutablePropagator(rc=None)
Bases:
WESTPropagator
- ENV_CURRENT_ITER = 'WEST_CURRENT_ITER'
- ENV_CURRENT_SEG_ID = 'WEST_CURRENT_SEG_ID'
- ENV_CURRENT_SEG_DATA_REF = 'WEST_CURRENT_SEG_DATA_REF'
- ENV_CURRENT_SEG_INITPOINT = 'WEST_CURRENT_SEG_INITPOINT_TYPE'
- ENV_PARENT_SEG_ID = 'WEST_PARENT_ID'
- ENV_PARENT_DATA_REF = 'WEST_PARENT_DATA_REF'
- ENV_BSTATE_ID = 'WEST_BSTATE_ID'
- ENV_BSTATE_DATA_REF = 'WEST_BSTATE_DATA_REF'
- ENV_ISTATE_ID = 'WEST_ISTATE_ID'
- ENV_ISTATE_DATA_REF = 'WEST_ISTATE_DATA_REF'
- ENV_STRUCT_DATA_REF = 'WEST_STRUCT_DATA_REF'
- ENV_RAND16 = 'WEST_RAND16'
- ENV_RAND32 = 'WEST_RAND32'
- ENV_RAND64 = 'WEST_RAND64'
- ENV_RAND128 = 'WEST_RAND128'
- ENV_RANDFLOAT = 'WEST_RANDFLOAT'
- static makepath(template, template_args=None, expanduser=True, expandvars=True, abspath=False, realpath=False)
- random_val_env_vars()
Return a set of environment variables containing random seeds. These are returned as a dictionary, suitable for use in
os.environ.update()
or as theenv
argument tosubprocess.Popen()
. Every child process executed byexec_child()
gets these.
- exec_child(executable, environ=None, stdin=None, stdout=None, stderr=None, cwd=None)
Execute a child process with the environment set from the current environment, the values of self.addtl_child_environ, the random numbers returned by self.random_val_env_vars, and the given
environ
(applied in that order). stdin/stdout/stderr are optionally redirected.This function waits on the child process to finish, then returns (rc, rusage), where rc is the child’s return code and rusage is the resource usage tuple from os.wait4()
- exec_child_from_child_info(child_info, template_args, environ)
- update_args_env_basis_state(template_args, environ, basis_state)
- update_args_env_initial_state(template_args, environ, initial_state)
- update_args_env_iter(template_args, environ, n_iter)
- update_args_env_segment(template_args, environ, segment)
- template_args_for_segment(segment)
- exec_for_segment(child_info, segment, addtl_env=None)
Execute a child process with environment and template expansion from the given segment.
- exec_for_iteration(child_info, n_iter, addtl_env=None)
Execute a child process with environment and template expansion from the given iteration number.
- exec_for_basis_state(child_info, basis_state, addtl_env=None)
Execute a child process with environment and template expansion from the given basis state
- exec_for_initial_state(child_info, initial_state, addtl_env=None)
Execute a child process with environment and template expansion from the given initial state.
- prepare_file_system(segment, environ)
- setup_dataset_return(segment=None, subset_keys=None)
Set up temporary files and environment variables that point to them for segment runners to return data.
segment
is theSegment
object that the return data is associated with.subset_keys
specifies the names of a subset of data to be returned.
- retrieve_dataset_return(state, return_files, del_return_files, single_point)
Retrieve returned data from the temporary locations directed by the environment variables.
state
is aSegment
,BasisState
, orInitialState``object that the return data is associated with. ``return_files
is adict
where the keys are the dataset names and the values are the paths to the temporarily files that contain the returned data.del_return_files
is adict
where the keys are the names of datasets to be deleted (if the corresponding value is set toTrue
) once the data is retrieved.
- get_pcoord(state)
Get the progress coordinate of the given basis or initial state.
- gen_istate(basis_state, initial_state)
Generate a new initial state from the given basis state.
- prepare_iteration(n_iter, segments)
Perform any necessary per-iteration preparation. This is run by the work manager.
- finalize_iteration(n_iter, segments)
Perform any necessary post-iteration cleanup. This is run by the work manager.
- propagate(segments)
Propagate one or more segments, including any necessary per-iteration setup and teardown for this propagator.
westpa.core.reweight package
westpa.core.reweight module
Function(s) for the postanalysis toolkit
- westpa.core.reweight.stats_process(bin_assignments, weights, fluxes, populations, trans, mask, interval='timepoint')
- westpa.core.reweight.reweight_for_c(rows, cols, obs, flux, insert, indices, nstates, nbins, state_labels, state_map, nfbins, istate, jstate, stride, bin_last_state_map, bin_state_map, return_obs, obs_threshold=1)
westpa.core.reweight.matrix module
- westpa.core.reweight.matrix.weight_dtype
alias of
float64
- westpa.core.reweight.matrix.index_dtype
alias of
uint16
- westpa.core.reweight.matrix.stats_process(bin_assignments, weights, fluxes, populations, trans, mask, interval='timepoint')
- westpa.core.reweight.matrix.calc_stats(bin_assignments, weights, fluxes, populations, trans, mask, sampling_frequency)
westpa.core modules
westpa.core module
westpa.core.data_manager module
HDF5 data manager for WEST.
Original HDF5 implementation: Joseph W. Kaus Current implementation: Matthew C. Zwier
WEST exclusively uses the cross-platform, self-describing file format HDF5 for data storage. This ensures that data is stored efficiently and portably in a manner that is relatively straightforward for other analysis tools (perhaps written in C/C++/Fortran) to access.
- The data is laid out in HDF5 as follows:
summary – overall summary data for the simulation
- /iterations/ – data for individual iterations, one group per iteration under /iterations
- iter_00000001/ – data for iteration 1
seg_index – overall information about segments in the iteration, including weight
pcoord – progress coordinate data organized as [seg_id][time][dimension]
wtg_parents – data used to reconstruct the split/merge history of trajectories
recycling – flux and event count for recycled particles, on a per-target-state basis
auxdata/ – auxiliary datasets (data stored on the ‘data’ field of Segment objects)
The file root object has an integer attribute ‘west_file_format_version’ which can be used to determine how to access data even as the file format (i.e. organization of data within HDF5 file) evolves.
- Version history:
- Version 9
Basis states are now saved as iter_segid instead of just segid as a pointer label.
Initial states are also saved in the iteration 0 file, with a negative sign.
- Version 8
Added external links to trajectory files in iterations/iter_* groups, if the HDF5 framework was used.
Added an iter group for the iteration 0 to store conformations of basis states.
- Version 7
Removed bin_assignments, bin_populations, and bin_rates from iteration group.
Added new_segments subgroup to iteration group
- Version 6
???
- Version 5
moved iter_* groups into a top-level iterations/ group,
added in-HDF5 storage for basis states, target states, and generated states
- class westpa.core.data_manager.attrgetter(attr, /, *attrs)
Bases:
object
Return a callable object that fetches the given attribute(s) from its operand. After f = attrgetter(‘name’), the call f(r) returns r.name. After g = attrgetter(‘name’, ‘date’), the call g(r) returns (r.name, r.date). After h = attrgetter(‘name.first’, ‘name.last’), the call h(r) returns (r.name.first, r.name.last).
- westpa.core.data_manager.relpath(path, start=None)
Return a relative version of a path
- westpa.core.data_manager.dirname(p)
Returns the directory component of a pathname
- class westpa.core.data_manager.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.core.data_manager.BasisState(label, probability, pcoord=None, auxref=None, state_id=None)
Bases:
object
Describes an basis (micro)state. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation (i.e. at w_init) or due to recycling.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
probability – Probability of this state to be selected when creating a new trajectory.
pcoord – The representative progress coordinate of this state.
auxref – A user-provided (string) reference for locating data associated with this state (usually a filesystem path).
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile)
Read a file defining basis states. Each line defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in:
unbound 1.0
or:
unbound_0 0.6 state0.pdb unbound_1 0.4 state1.pdb
- as_numpy_record()
Return the data for this state as a numpy record array.
- class westpa.core.data_manager.TargetState(label, pcoord, state_id=None)
Bases:
object
Describes a target state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
pcoord – The representative progress coordinate of this state.
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile, dtype)
Read a file defining target states. Each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in:
bound 0.02
for a single target and one-dimensional progress coordinates or:
bound 2.7 0.0 drift 100 50.0
for two targets and a two-dimensional progress coordinate.
- class westpa.core.data_manager.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- class westpa.core.data_manager.NewWeightEntry(source_type, weight, prev_seg_id=None, prev_init_pcoord=None, prev_final_pcoord=None, new_init_pcoord=None, target_state_id=None, initial_state_id=None)
Bases:
object
- NW_SOURCE_RECYCLED = 0
- class westpa.core.data_manager.ExecutablePropagator(rc=None)
Bases:
WESTPropagator
- ENV_CURRENT_ITER = 'WEST_CURRENT_ITER'
- ENV_CURRENT_SEG_ID = 'WEST_CURRENT_SEG_ID'
- ENV_CURRENT_SEG_DATA_REF = 'WEST_CURRENT_SEG_DATA_REF'
- ENV_CURRENT_SEG_INITPOINT = 'WEST_CURRENT_SEG_INITPOINT_TYPE'
- ENV_PARENT_SEG_ID = 'WEST_PARENT_ID'
- ENV_PARENT_DATA_REF = 'WEST_PARENT_DATA_REF'
- ENV_BSTATE_ID = 'WEST_BSTATE_ID'
- ENV_BSTATE_DATA_REF = 'WEST_BSTATE_DATA_REF'
- ENV_ISTATE_ID = 'WEST_ISTATE_ID'
- ENV_ISTATE_DATA_REF = 'WEST_ISTATE_DATA_REF'
- ENV_STRUCT_DATA_REF = 'WEST_STRUCT_DATA_REF'
- ENV_RAND16 = 'WEST_RAND16'
- ENV_RAND32 = 'WEST_RAND32'
- ENV_RAND64 = 'WEST_RAND64'
- ENV_RAND128 = 'WEST_RAND128'
- ENV_RANDFLOAT = 'WEST_RANDFLOAT'
- static makepath(template, template_args=None, expanduser=True, expandvars=True, abspath=False, realpath=False)
- random_val_env_vars()
Return a set of environment variables containing random seeds. These are returned as a dictionary, suitable for use in
os.environ.update()
or as theenv
argument tosubprocess.Popen()
. Every child process executed byexec_child()
gets these.
- exec_child(executable, environ=None, stdin=None, stdout=None, stderr=None, cwd=None)
Execute a child process with the environment set from the current environment, the values of self.addtl_child_environ, the random numbers returned by self.random_val_env_vars, and the given
environ
(applied in that order). stdin/stdout/stderr are optionally redirected.This function waits on the child process to finish, then returns (rc, rusage), where rc is the child’s return code and rusage is the resource usage tuple from os.wait4()
- exec_child_from_child_info(child_info, template_args, environ)
- update_args_env_basis_state(template_args, environ, basis_state)
- update_args_env_initial_state(template_args, environ, initial_state)
- update_args_env_iter(template_args, environ, n_iter)
- update_args_env_segment(template_args, environ, segment)
- template_args_for_segment(segment)
- exec_for_segment(child_info, segment, addtl_env=None)
Execute a child process with environment and template expansion from the given segment.
- exec_for_iteration(child_info, n_iter, addtl_env=None)
Execute a child process with environment and template expansion from the given iteration number.
- exec_for_basis_state(child_info, basis_state, addtl_env=None)
Execute a child process with environment and template expansion from the given basis state
- exec_for_initial_state(child_info, initial_state, addtl_env=None)
Execute a child process with environment and template expansion from the given initial state.
- prepare_file_system(segment, environ)
- setup_dataset_return(segment=None, subset_keys=None)
Set up temporary files and environment variables that point to them for segment runners to return data.
segment
is theSegment
object that the return data is associated with.subset_keys
specifies the names of a subset of data to be returned.
- retrieve_dataset_return(state, return_files, del_return_files, single_point)
Retrieve returned data from the temporary locations directed by the environment variables.
state
is aSegment
,BasisState
, orInitialState``object that the return data is associated with. ``return_files
is adict
where the keys are the dataset names and the values are the paths to the temporarily files that contain the returned data.del_return_files
is adict
where the keys are the names of datasets to be deleted (if the corresponding value is set toTrue
) once the data is retrieved.
- get_pcoord(state)
Get the progress coordinate of the given basis or initial state.
- gen_istate(basis_state, initial_state)
Generate a new initial state from the given basis state.
- prepare_iteration(n_iter, segments)
Perform any necessary per-iteration preparation. This is run by the work manager.
- finalize_iteration(n_iter, segments)
Perform any necessary post-iteration cleanup. This is run by the work manager.
- propagate(segments)
Propagate one or more segments, including any necessary per-iteration setup and teardown for this propagator.
- westpa.core.data_manager.makepath(template, template_args=None, expanduser=True, expandvars=True, abspath=False, realpath=False)
- class westpa.core.data_manager.flushing_lock(lock, fileobj)
Bases:
object
- class westpa.core.data_manager.expiring_flushing_lock(lock, flush_method, nextsync)
Bases:
object
- westpa.core.data_manager.seg_id_dtype
alias of
int64
- westpa.core.data_manager.n_iter_dtype
alias of
uint32
- westpa.core.data_manager.weight_dtype
alias of
float64
- westpa.core.data_manager.utime_dtype
alias of
float64
- westpa.core.data_manager.seg_status_dtype
alias of
uint8
- westpa.core.data_manager.seg_initpoint_dtype
alias of
uint8
- westpa.core.data_manager.seg_endpoint_dtype
alias of
uint8
- westpa.core.data_manager.istate_type_dtype
alias of
uint8
- westpa.core.data_manager.istate_status_dtype
alias of
uint8
- westpa.core.data_manager.nw_source_dtype
alias of
uint8
- class westpa.core.data_manager.WESTDataManager(rc=None)
Bases:
object
Data manager for assisiting the reading and writing of WEST data from/to HDF5 files.
- default_iter_prec = 8
- default_we_h5filename = 'west.h5'
- default_we_h5file_driver = None
- default_flush_period = 60
- default_aux_compression_threshold = 1048576
- binning_hchunksize = 4096
- table_scan_chunksize = 1024
- flushing_lock()
- expiring_flushing_lock()
- process_config()
- property system
- property closed
- iter_group_name(n_iter, absolute=True)
- require_iter_group(n_iter)
Get the group associated with n_iter, creating it if necessary.
- del_iter_group(n_iter)
- get_iter_group(n_iter)
- get_seg_index(n_iter)
- property current_iteration
- open_backing(mode=None)
Open the (already-created) HDF5 file named in self.west_h5filename.
- prepare_backing()
Create new HDF5 file
- close_backing()
- flush_backing()
- save_target_states(tstates, n_iter=None)
Save the given target states in the HDF5 file; they will be used for the next iteration to be propagated. A complete set is required, even if nominally appending to an existing set, which simplifies the mapping of IDs to the table.
- find_tstate_group(n_iter)
- find_ibstate_group(n_iter)
- get_target_states(n_iter)
Return a list of Target objects representing the target (sink) states that are in use for iteration n_iter. Future iterations are assumed to continue from the most recent set of states.
- create_ibstate_group(basis_states, n_iter=None)
Create the group used to store basis states and initial states (whose definitions are always coupled). This group is hard-linked into all iteration groups that use these basis and initial states.
- create_ibstate_iter_h5file(basis_states)
Create the per-iteration HDF5 file for the basis states (i.e., iteration 0). This special treatment is needed so that the analysis tools can access basis states more easily.
- update_iter_h5file(n_iter, segments)
Write out the per-iteration HDF5 file with given segments and add an external link to it in the main HDF5 file (west.h5) if the link is not present.
- get_basis_states(n_iter=None)
Return a list of BasisState objects representing the basis states that are in use for iteration n_iter.
- create_initial_states(n_states, n_iter=None)
Create storage for
n_states
initial states associated with iterationn_iter
, and return bare InitialState objects with only state_id set.
- update_initial_states(initial_states, n_iter=None)
Save the given initial states in the HDF5 file
- get_initial_states(n_iter=None)
- get_segment_initial_states(segments, n_iter=None)
Retrieve all initial states referenced by the given segments.
- get_unused_initial_states(n_states=None, n_iter=None)
Retrieve any prepared but unused initial states applicable to the given iteration. Up to
n_states
states are returned; ifn_states
is None, then all unused states are returned.
- prepare_iteration(n_iter, segments)
Prepare for a new iteration by creating space to store the new iteration’s data. The number of segments, their IDs, and their lineage must be determined and included in the set of segments passed in.
- update_iter_group_links(n_iter)
Update the per-iteration hard links pointing to the tables of target and initial/basis states for the given iteration. These links are not used by this class, but are remarkably convenient for third-party analysis tools and hdfview.
- get_iter_summary(n_iter=None)
- update_iter_summary(summary, n_iter=None)
- del_iter_summary(min_iter)
- update_segments(n_iter, segments)
Update segment information in the HDF5 file; all prior information for each
segment
is overwritten, except for parent and weight transfer information.
- get_segments(n_iter=None, seg_ids=None, load_pcoords=True)
Return the given (or all) segments from a given iteration.
If the optional parameter
load_auxdata
is true, then all auxiliary datasets available are loaded and mapped onto thedata
dictionary of each segment. Ifload_auxdata
is None, then use the defaultself.auto_load_auxdata
, which can be set by the optionload_auxdata
in the[data]
section ofwest.cfg
. This essentially requires as much RAM as there is per-iteration auxiliary data, so this behavior is not on by default.
- prepare_segment_restarts(segments, basis_states=None, initial_states=None)
Prepare the necessary folder and files given the data stored in parent per-iteration HDF5 file for propagating the simulation.
basis_states
andinitial_states
should be provided if the segments are newly created
- get_all_parent_ids(n_iter)
- get_parent_ids(n_iter, seg_ids=None)
Return a sequence of the parent IDs of the given seg_ids.
- get_weights(n_iter, seg_ids)
Return the weights associated with the given seg_ids
- get_child_ids(n_iter, seg_id)
Return the seg_ids of segments who have the given segment as a parent.
- get_children(segment)
Return all segments which have the given segment as a parent
- prepare_run()
- finalize_run()
- save_new_weight_data(n_iter, new_weights)
Save a set of NewWeightEntry objects to HDF5. Note that this should be called for the iteration in which the weights appear in their new locations (e.g. for recycled walkers, the iteration following recycling).
- get_new_weight_data(n_iter)
- find_bin_mapper(hashval)
Check to see if the given has value is in the binning table. Returns the index in the bin data tables if found, or raises KeyError if not.
- get_bin_mapper(hashval)
Look up the given hash value in the binning table, unpickling and returning the corresponding bin mapper if available, or raising KeyError if not.
- save_bin_mapper(hashval, pickle_data)
Store the given mapper in the table of saved mappers. If the mapper cannot be stored, PickleError will be raised. Returns the index in the bin data tables where the mapper is stored.
- save_iter_binning(n_iter, hashval, pickled_mapper, target_counts)
Save information about the binning used to generate segments for iteration n_iter.
- westpa.core.data_manager.normalize_dataset_options(dsopts, path_prefix='', n_iter=0)
- westpa.core.data_manager.create_dataset_from_dsopts(group, dsopts, shape=None, dtype=None, data=None, autocompress_threshold=None, n_iter=None)
- westpa.core.data_manager.require_dataset_from_dsopts(group, dsopts, shape=None, dtype=None, data=None, autocompress_threshold=None, n_iter=None)
- westpa.core.data_manager.calc_chunksize(shape, dtype, max_chunksize=262144)
Calculate a chunk size for HDF5 data, anticipating that access will slice along lower dimensions sooner than higher dimensions.
westpa.core.extloader module
- westpa.core.extloader.load_module(module_name, path=None)
Load and return the given module, recursively loading containing packages as necessary.
- westpa.core.extloader.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
westpa.core.h5io module
Miscellaneous routines to help with HDF5 input and output of WEST-related data.
- class westpa.core.h5io.Trajectory(xyz, topology, time=None, unitcell_lengths=None, unitcell_angles=None)
Bases:
object
Container object for a molecular dynamics trajectory
A Trajectory represents a collection of one or more molecular structures, generally (but not necessarily) from a molecular dynamics trajectory. The Trajectory stores a number of fields describing the system through time, including the cartesian coordinates of each atoms (
xyz
), the topology of the molecular system (topology
), and information about the unitcell if appropriate (unitcell_vectors
,unitcell_length
,unitcell_angles
).A Trajectory should generally be constructed by loading a file from disk. Trajectories can be loaded from (and saved to) the PDB, XTC, TRR, DCD, binpos, NetCDF or MDTraj HDF5 formats.
Trajectory supports fancy indexing, so you can extract one or more frames from a Trajectory as a separate trajectory. For example, to form a trajectory with every other frame, you can slice with
traj[::2]
.Trajectory uses the nanometer, degree & picosecond unit system.
Examples
>>> # loading a trajectory >>> import mdtraj as md >>> md.load('trajectory.xtc', top='native.pdb') <mdtraj.Trajectory with 1000 frames, 22 atoms at 0x1058a73d0>
>>> # slicing a trajectory >>> t = md.load('trajectory.h5') >>> print(t) <mdtraj.Trajectory with 100 frames, 22 atoms> >>> print(t[::2]) <mdtraj.Trajectory with 50 frames, 22 atoms>
>>> # calculating the average distance between two atoms >>> import mdtraj as md >>> import numpy as np >>> t = md.load('trajectory.h5') >>> np.mean(np.sqrt(np.sum((t.xyz[:, 0, :] - t.xyz[:, 21, :])**2, axis=1)))
See also
mdtraj.load
High-level function that loads files and returns an
md.Trajectory
- n_frames
- Type:
int
- n_atoms
- Type:
int
- n_residues
- Type:
int
- time
- Type:
np.ndarray, shape=(n_frames,)
- timestep
- Type:
float
- topology
- Type:
md.Topology
- top
- Type:
md.Topology
- xyz
- Type:
np.ndarray, shape=(n_frames, n_atoms, 3)
- unitcell_vectors
- Type:
{np.ndarray, shape=(n_frames, 3, 3), None}
- unitcell_lengths
- Type:
{np.ndarray, shape=(n_frames, 3), None}
- unitcell_angles
- Type:
{np.ndarray, shape=(n_frames, 3), None}
- property n_frames
Number of frames in the trajectory
- Returns:
n_frames – The number of frames in the trajectory
- Return type:
int
- property n_atoms
Number of atoms in the trajectory
- Returns:
n_atoms – The number of atoms in the trajectory
- Return type:
int
- property n_residues
Number of residues (amino acids) in the trajectory
- Returns:
n_residues – The number of residues in the trajectory’s topology
- Return type:
int
- property n_chains
Number of chains in the trajectory
- Returns:
n_chains – The number of chains in the trajectory’s topology
- Return type:
int
- property top
Alias for self.topology, describing the organization of atoms into residues, bonds, etc
- Returns:
topology – The topology object, describing the organization of atoms into residues, bonds, etc
- Return type:
md.Topology
- property timestep
Timestep between frames, in picoseconds
- Returns:
timestep – The timestep between frames, in picoseconds.
- Return type:
float
- property unitcell_vectors
The vectors that define the shape of the unit cell in each frame
- Returns:
vectors – Vectors defining the shape of the unit cell in each frame. The semantics of this array are that the shape of the unit cell in frame
i
are given by the three vectors,value[i, 0, :]
,value[i, 1, :]
, andvalue[i, 2, :]
.- Return type:
np.ndarray, shape(n_frames, 3, 3)
- property unitcell_volumes
Volumes of unit cell for each frame.
- Returns:
volumes – Volumes of the unit cell in each frame, in nanometers^3, or None if the Trajectory contains no unitcell information.
- Return type:
{np.ndarray, shape=(n_frames), None}
- superpose(reference, frame=0, atom_indices=None, ref_atom_indices=None, parallel=True)
Superpose each conformation in this trajectory upon a reference
- Parameters:
reference (md.Trajectory) – Align self to a particular frame in reference
frame (int) – The index of the conformation in reference to align to.
atom_indices (array_like, or None) – The indices of the atoms to superpose. If not supplied, all atoms will be used.
ref_atom_indices (array_like, or None) – Use these atoms on the reference structure. If not supplied, the same atom indices will be used for this trajectory and the reference one.
parallel (bool) – Use OpenMP to run the superposition in parallel over multiple cores
- Return type:
self
- join(other, check_topology=True, discard_overlapping_frames=False)
Join two trajectories together along the time/frame axis.
This method joins trajectories along the time axis, giving a new trajectory of length equal to the sum of the lengths of self and other. It can also be called by using self + other
- Parameters:
other (Trajectory or list of Trajectory) – One or more trajectories to join with this one. These trajectories are appended to the end of this trajectory.
check_topology (bool) – Ensure that the topology of self and other are identical before joining them. If false, the resulting trajectory will have the topology of self.
discard_overlapping_frames (bool, optional) – If True, compare coordinates at trajectory edges to discard overlapping frames. Default: False.
See also
stack
join two trajectories along the atom axis
- stack(other, keep_resSeq=True)
Stack two trajectories along the atom axis
This method joins trajectories along the atom axis, giving a new trajectory with a number of atoms equal to the sum of the number of atoms in self and other.
Notes
The resulting trajectory will have the unitcell and time information the left operand.
Examples
>>> t1 = md.load('traj1.h5') >>> t2 = md.load('traj2.h5') >>> # even when t2 contains no unitcell information >>> t2.unitcell_vectors = None >>> stacked = t1.stack(t2) >>> # the stacked trajectory inherits the unitcell information >>> # from the first trajectory >>> np.all(stacked.unitcell_vectors == t1.unitcell_vectors) True
- Parameters:
other (Trajectory) – The other trajectory to join
keep_resSeq (bool, optional, default=True) – see
`mdtraj.core.topology.Topology.join`
method documentation
See also
join
join two trajectories along the time/frame axis.
- slice(key, copy=True)
Slice trajectory, by extracting one or more frames into a separate object
This method can also be called using index bracket notation, i.e traj[1] == traj.slice(1)
- Parameters:
key ({int, np.ndarray, slice}) – The slice to take. Can be either an int, a list of ints, or a slice object.
copy (bool, default=True) – Copy the arrays after slicing. If you set this to false, then if you modify a slice, you’ll modify the original array since they point to the same data.
- property topology
Topology of the system, describing the organization of atoms into residues, bonds, etc
- Returns:
topology – The topology object, describing the organization of atoms into residues, bonds, etc
- Return type:
md.Topology
- property xyz
Cartesian coordinates of each atom in each simulation frame
- Returns:
xyz – A three dimensional numpy array, with the cartesian coordinates of each atoms in each frame.
- Return type:
np.ndarray, shape=(n_frames, n_atoms, 3)
- property unitcell_lengths
Lengths that define the shape of the unit cell in each frame.
- Returns:
lengths – Lengths of the unit cell in each frame, in nanometers, or None if the Trajectory contains no unitcell information.
- Return type:
{np.ndarray, shape=(n_frames, 3), None}
- property unitcell_angles
Angles that define the shape of the unit cell in each frame.
- Returns:
lengths – The angles between the three unitcell vectors in each frame,
alpha
,beta
, andgamma
.alpha' gives the angle between vectors ``b
andc
,beta
gives the angle between vectorsc
anda
, andgamma
gives the angle between vectorsa
andb
. The angles are in degrees.- Return type:
np.ndarray, shape=(n_frames, 3)
- property time
The simulation time corresponding to each frame, in picoseconds
- Returns:
time – The simulation time corresponding to each frame, in picoseconds
- Return type:
np.ndarray, shape=(n_frames,)
- openmm_positions(frame)
OpenMM-compatable positions of a single frame.
Examples
>>> t = md.load('trajectory.h5') >>> context.setPositions(t.openmm_positions(0))
- Parameters:
frame (int) – The index of frame of the trajectory that you wish to extract
- Returns:
positions – The cartesian coordinates of specific trajectory frame, formatted for input to OpenMM
- Return type:
list
- openmm_boxes(frame)
OpenMM-compatable box vectors of a single frame.
Examples
>>> t = md.load('trajectory.h5') >>> context.setPeriodicBoxVectors(t.openmm_positions(0))
- Parameters:
frame (int) – Return box for this single frame.
- Returns:
box – The periodic box vectors for this frame, formatted for input to OpenMM.
- Return type:
tuple
- static load(filenames, **kwargs)
Load a trajectory from disk
- Parameters:
filenames ({path-like, [path-like]}) – Either a path or list of paths
extension (As requested by the various load functions -- it depends on the)
- save(filename, **kwargs)
Save trajectory to disk, in a format determined by the filename extension
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory. The extension will be parsed and will control the format.
lossy (bool) – For .h5 or .lh5, whether or not to use compression.
no_models (bool) – For .pdb. TODO: Document this?
force_overwrite (bool) – If filename already exists, overwrite it.
- save_hdf5(filename, force_overwrite=True)
Save trajectory to MDTraj HDF5 format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_lammpstrj(filename, force_overwrite=True)
Save trajectory to LAMMPS custom dump format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_xyz(filename, force_overwrite=True)
Save trajectory to .xyz format.
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_pdb(filename, force_overwrite=True, bfactors=None)
Save trajectory to RCSB PDB format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
bfactors (array_like, default=None, shape=(n_frames, n_atoms) or (n_atoms,)) – Save bfactors with pdb file. If the array is two dimensional it should contain a bfactor for each atom in each frame of the trajectory. Otherwise, the same bfactor will be saved in each frame.
- save_xtc(filename, force_overwrite=True)
Save trajectory to Gromacs XTC format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_trr(filename, force_overwrite=True)
Save trajectory to Gromacs TRR format
Notes
Only the xyz coordinates and the time are saved, the velocities and forces in the trr will be zeros
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_dcd(filename, force_overwrite=True)
Save trajectory to CHARMM/NAMD DCD format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filenames, if its already there
- save_dtr(filename, force_overwrite=True)
Save trajectory to DESMOND DTR format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filenames, if its already there
- save_binpos(filename, force_overwrite=True)
Save trajectory to AMBER BINPOS format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_mdcrd(filename, force_overwrite=True)
Save trajectory to AMBER mdcrd format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_netcdf(filename, force_overwrite=True)
Save trajectory in AMBER NetCDF format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there
- save_netcdfrst(filename, force_overwrite=True)
Save trajectory in AMBER NetCDF restart format
- Parameters:
filename (path-like) – filesystem path in which to save the restart
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there
Notes
NetCDF restart files can only store a single frame. If only one frame exists, “filename” will be written. Otherwise, “filename.#” will be written, where # is a zero-padded number from 1 to the total number of frames in the trajectory
- save_amberrst7(filename, force_overwrite=True)
Save trajectory in AMBER ASCII restart format
- Parameters:
filename (path-like) – filesystem path in which to save the restart
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there
Notes
Amber restart files can only store a single frame. If only one frame exists, “filename” will be written. Otherwise, “filename.#” will be written, where # is a zero-padded number from 1 to the total number of frames in the trajectory
- save_lh5(filename, force_overwrite=True)
Save trajectory in deprecated MSMBuilder2 LH5 (lossy HDF5) format.
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there
- save_gro(filename, force_overwrite=True, precision=3)
Save trajectory in Gromacs .gro format
- Parameters:
filename (path-like) – Path to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at that filename if it exists
precision (int, default=3) – The number of decimal places to use for coordinates in GRO file
- save_tng(filename, force_overwrite=True)
Save trajectory to Gromacs TNG format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there
- save_gsd(filename, force_overwrite=True)
Save trajectory to HOOMD GSD format
- Parameters:
filename (path-like) – filesystem path in which to save the trajectory
force_overwrite (bool, default=True) – Overwrite anything that exists at filenames, if its already there
- center_coordinates(mass_weighted=False)
Center each trajectory frame at the origin (0,0,0).
This method acts inplace on the trajectory. The centering can be either uniformly weighted (mass_weighted=False) or weighted by the mass of each atom (mass_weighted=True).
- Parameters:
mass_weighted (bool, optional (default = False)) – If True, weight atoms by mass when removing COM.
- Return type:
self
- restrict_atoms(**kwargs)
DEPRECATED: restrict_atoms was replaced by atom_slice and will be removed in 2.0
Retain only a subset of the atoms in a trajectory
Deletes atoms not in atom_indices, and re-indexes those that remain
- atom_indicesarray-like, dtype=int, shape=(n_atoms)
List of atom indices to keep.
- inplacebool, default=True
If
True
, the operation is done inplace, modifyingself
. Otherwise, a copy is returned with the restricted atoms, andself
is not modified.
- trajmd.Trajectory
The return value is either
self
, or the new trajectory, depending on the value ofinplace
.
- atom_slice(atom_indices, inplace=False)
Create a new trajectory from a subset of atoms
- Parameters:
atom_indices (array-like, dtype=int, shape=(n_atoms)) – List of indices of atoms to retain in the new trajectory.
inplace (bool, default=False) – If
True
, the operation is done inplace, modifyingself
. Otherwise, a copy is returned with the sliced atoms, andself
is not modified.
- Returns:
traj – The return value is either
self
, or the new trajectory, depending on the value ofinplace
.- Return type:
md.Trajectory
See also
stack
stack multiple trajectories along the atom axis
- remove_solvent(exclude=None, inplace=False)
Create a new trajectory without solvent atoms
- Parameters:
exclude (array-like, dtype=str, shape=(n_solvent_types)) – List of solvent residue names to retain in the new trajectory.
inplace (bool, default=False) – The return value is either
self
, or the new trajectory, depending on the value ofinplace
.
- Returns:
traj – The return value is either
self
, or the new trajectory, depending on the value ofinplace
.- Return type:
md.Trajectory
- smooth(width, order=3, atom_indices=None, inplace=False)
Smoothen a trajectory using a zero-delay Buttersworth filter. Please note that for optimal results the trajectory should be properly aligned prior to smoothing (see md.Trajectory.superpose).
- Parameters:
width (int) – This acts very similar to the window size in a moving average smoother. In this implementation, the frequency of the low-pass filter is taken to be two over this width, so it’s like “half the period” of the sinusiod where the filter starts to kick in. Must be an integer greater than one.
order (int, optional, default=3) – The order of the filter. A small odd number is recommended. Higher order filters cutoff more quickly, but have worse numerical properties.
atom_indices (array-like, dtype=int, shape=(n_atoms), default=None) – List of indices of atoms to retain in the new trajectory. Default is set to None, which applies smoothing to all atoms.
inplace (bool, default=False) – The return value is either
self
, or the new trajectory, depending on the value ofinplace
.
- Returns:
traj – The return value is either
self
, or the new smoothed trajectory, depending on the value ofinplace
.- Return type:
md.Trajectory
References
- make_molecules_whole(inplace=False, sorted_bonds=None)
Only make molecules whole
- Parameters:
inplace (bool) – If False, a new Trajectory is created and returned. If True, this Trajectory is modified directly.
sorted_bonds (array of shape (n_bonds, 2)) – Pairs of atom indices that define bonds, in sorted order. If not specified, these will be determined from the trajectory’s topology.
See also
- image_molecules(inplace=False, anchor_molecules=None, other_molecules=None, sorted_bonds=None, make_whole=True)
Recenter and apply periodic boundary conditions to the molecules in each frame of the trajectory.
This method is useful for visualizing a trajectory in which molecules were not wrapped to the periodic unit cell, or in which the macromolecules are not centered with respect to the solvent. It tries to be intelligent in deciding what molecules to center, so you can simply call it and trust that it will “do the right thing”.
- Parameters:
inplace (bool, default=False) – If False, a new Trajectory is created and returned. If True, this Trajectory is modified directly.
anchor_molecules (list of atom sets, optional, default=None) – Molecule that should be treated as an “anchor”. These molecules will be centered in the box and put near each other. If not specified, anchor molecules are guessed using a heuristic.
other_molecules (list of atom sets, optional, default=None) – Molecules that are not anchors. If not specified, these will be molecules other than the anchor molecules
sorted_bonds (array of shape (n_bonds, 2)) – Pairs of atom indices that define bonds, in sorted order. If not specified, these will be determined from the trajectory’s topology. Only relevant if
make_whole
is True.make_whole (bool) – Whether to make molecules whole.
- Returns:
traj – The return value is either
self
or the new trajectory, depending on the value ofinplace
.- Return type:
md.Trajectory
See also
Topology.guess_anchor_molecules
- westpa.core.h5io.join_traj(trajs, check_topology=True, discard_overlapping_frames=False)
Concatenate multiple trajectories into one long trajectory
- Parameters:
trajs (iterable of trajectories) – Combine these into one trajectory
check_topology (bool) – Make sure topologies match before joining
discard_overlapping_frames (bool) – Check for overlapping frames and discard
- westpa.core.h5io.in_units_of(quantity, units_in, units_out, inplace=False)
Convert a numerical quantity between unit systems.
- Parameters:
quantity ({number, np.ndarray, openmm.unit.Quantity}) – quantity can either be a unitted quantity – i.e. instance of openmm.unit.Quantity, or just a bare number or numpy array
units_in (str) – If you supply a quantity that’s not a openmm.unit.Quantity, you should tell me what units it is in. If you don’t, i’m just going to echo you back your quantity without doing any unit checking.
units_out (str) – A string description of the units you want out. This should look like “nanometers/picosecond” or “nanometers**3” or whatever
inplace (bool) – Attempt to do the transformation inplace, by mutating the quantity argument and avoiding a copy. This is only possible if quantity is a writable numpy array.
- Returns:
rquantity – The resulting quantity, in the new unit system. If the function was called with inplace=True and quantity was a writable numpy array, rquantity will alias the same memory as the input quantity, which will have been changed inplace. Otherwise, if a copy was required, rquantity will point to new memory.
- Return type:
{number, np.ndarray}
Examples
>>> in_units_of(1, 'meter**2/second', 'nanometers**2/picosecond') 1000000.0
- westpa.core.h5io.import_(module)
Import a module, and issue a nice message to stderr if the module isn’t installed.
Currently, this function will print nice error messages for networkx, tables, netCDF4, and openmm.unit, which are optional MDTraj dependencies.
- Parameters:
module (str) – The module you’d like to import, as a string
- Returns:
module – The module object
- Return type:
{module, object}
Examples
>>> # the following two lines are equivalent. the difference is that the >>> # second will check for an ImportError and print you a very nice >>> # user-facing message about what's wrong (where you can install the >>> # module from, etc) if the import fails >>> import tables >>> tables = import_('tables')
- westpa.core.h5io.ensure_type(val, dtype, ndim, name, length=None, can_be_none=False, shape=None, warn_on_cast=True, add_newaxis_on_deficient_ndim=False)
Typecheck the size, shape and dtype of a numpy array, with optional casting.
- Parameters:
val ({np.ndaraay, None}) – The array to check
dtype ({nd.dtype, str}) – The dtype you’d like the array to have
ndim (int) – The number of dimensions you’d like the array to have
name (str) – name of the array. This is used when throwing exceptions, so that we can describe to the user which array is messed up.
length (int, optional) – How long should the array be?
can_be_none (bool) – Is
val == None
acceptable?shape (tuple, optional) – What should be shape of the array be? If the provided tuple has Nones in it, those will be semantically interpreted as matching any length in that dimension. So, for example, using the shape spec
(None, None, 3)
will ensure that the last dimension is of length three without constraining the first two dimensionswarn_on_cast (bool, default=True) – Raise a warning when the dtypes don’t match and a cast is done.
add_newaxis_on_deficient_ndim (bool, default=True) – Add a new axis to the beginining of the array if the number of dimensions is deficient by one compared to your specification. For instance, if you’re trying to get out an array of
ndim == 3
, but the user provides an array ofshape == (10, 10)
, a new axis will be created with length 1 in front, so that the return value is of shape(1, 10, 10)
.
Notes
The returned value will always be C-contiguous.
- Returns:
typechecked_val – If val=None and can_be_none=True, then this will return None. Otherwise, it will return val (or a copy of val). If the dtype wasn’t right, it’ll be casted to the right shape. If the array was not C-contiguous, it’ll be copied as well.
- Return type:
np.ndarray, None
- class westpa.core.h5io.HDF5TrajectoryFile(filename, mode='r', force_overwrite=True, compression='zlib')
Bases:
object
Interface for reading and writing to a MDTraj HDF5 molecular dynamics trajectory file, whose format is described here.
This is a file-like object, that both reading or writing depending on the mode flag. It implements the context manager protocol, so you can also use it with the python ‘with’ statement.
The format is extremely flexible and high performance. It can hold a wide variety of information about a trajectory, including fields like the temperature and energies. Because it’s built on the fantastic HDF5 library, it’s easily extensible too.
- Parameters:
filename (path-like) – Path to the file to open
mode ({'r, 'w'}) – Mode in which to open the file. ‘r’ is for reading and ‘w’ is for writing
force_overwrite (bool) – In mode=’w’, how do you want to behave if a file by the name of filename already exists? if force_overwrite=True, it will be overwritten.
compression ({'zlib', None}) – Apply compression to the file? This will save space, and does not cost too many cpu cycles, so it’s recommended.
- root
- title
- application
- topology
- randomState
- forcefield
- reference
- constraints
See also
mdtraj.load_hdf5
High-level wrapper that returns a
md.Trajectory
- distance_unit = 'nanometers'
- property root
Direct access to the root group of the underlying Tables HDF5 file handle.
This can be used for random or specific access to the underlying arrays on disk
- property title
User-defined title for the data represented in the file
- property application
Suite of programs that created the file
- property topology
Get the topology out from the file
- Returns:
topology – A topology object
- Return type:
mdtraj.Topology
- property randomState
State of the creators internal random number generator at the start of the simulation
- property forcefield
Description of the hamiltonian used. A short, human readable string, like AMBER99sbildn.
- property reference
A published reference that documents the program or parameters used to generate the data
- property constraints
Constraints applied to the bond lengths
- Returns:
constraints – A one dimensional array with the a int, int, float type giving the index of the two atoms involved in the constraints and the distance of the constraint. If no constraint information is in the file, the return value is None.
- Return type:
{None, np.array, dtype=[(‘atom1’, ‘<i4’), (‘atom2’, ‘<i4’), (‘distance’, ‘<f4’)])}
- read_as_traj(n_frames=None, stride=None, atom_indices=None)
Read a trajectory from the HDF5 file
- Parameters:
n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.
stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.
atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.
- Returns:
trajectory – A trajectory object containing the loaded portion of the file.
- Return type:
- read(n_frames=None, stride=None, atom_indices=None)
Read one or more frames of data from the file
- Parameters:
n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.
stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.
atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.
Notes
If you’d like more flexible access to the data, that is available by using the pytables group directly, which is accessible via the root property on this class.
- Returns:
frames – The returned namedtuple will have the fields “coordinates”, “time”, “cell_lengths”, “cell_angles”, “velocities”, “kineticEnergy”, “potentialEnergy”, “temperature” and “alchemicalLambda”. Each of the fields in the returned namedtuple will either be a numpy array or None, dependening on if that data was saved in the trajectory. All of the data shall be n units of “nanometers”, “picoseconds”, “kelvin”, “degrees” and “kilojoules_per_mole”.
- Return type:
namedtuple
- write(coordinates, time=None, cell_lengths=None, cell_angles=None, velocities=None, kineticEnergy=None, potentialEnergy=None, temperature=None, alchemicalLambda=None)
Write one or more frames of data to the file
This method saves data that is associated with one or more simulation frames. Note that all of the arguments can either be raw numpy arrays or unitted arrays (with openmm.unit.Quantity). If the arrays are unittted, a unit conversion will be automatically done from the supplied units into the proper units for saving on disk. You won’t have to worry about it.
Furthermore, if you wish to save a single frame of simulation data, you can do so naturally, for instance by supplying a 2d array for the coordinates and a single float for the time. This “shape deficiency” will be recognized, and handled appropriately.
- Parameters:
coordinates (np.ndarray, shape=(n_frames, n_atoms, 3)) – The cartesian coordinates of the atoms to write. By convention, the lengths should be in units of nanometers.
time (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the simulation time, in picoseconds corresponding to each frame.
cell_lengths (np.ndarray, shape=(n_frames, 3), dtype=float32, optional) – You may optionally specify the unitcell lengths. The length of the periodic box in each frame, in each direction, a, b, c. By convention the lengths should be in units of angstroms.
cell_angles (np.ndarray, shape=(n_frames, 3), dtype=float32, optional) – You may optionally specify the unitcell angles in each frame. Organized analogously to cell_lengths. Gives the alpha, beta and gamma angles respectively. By convention, the angles should be in units of degrees.
velocities (np.ndarray, shape=(n_frames, n_atoms, 3), optional) – You may optionally specify the cartesian components of the velocity for each atom in each frame. By convention, the velocities should be in units of nanometers / picosecond.
kineticEnergy (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the kinetic energy in each frame. By convention the kinetic energies should b in units of kilojoules per mole.
potentialEnergy (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the potential energy in each frame. By convention the kinetic energies should b in units of kilojoules per mole.
temperature (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the temperature in each frame. By convention the temperatures should b in units of Kelvin.
alchemicalLambda (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the alchemical lambda in each frame. These have no units, but are generally between zero and one.
- seek(offset, whence=0)
Move to a new file position
- Parameters:
offset (int) – A number of frames.
whence ({0, 1, 2}) – 0: offset from start of file, offset should be >=0. 1: move relative to the current position, positive or negative 2: move relative to the end of file, offset should be <= 0. Seeking beyond the end of a file is not supported
- tell()
Current file position
- Returns:
offset – The current frame in the file.
- Return type:
int
- close()
Close the HDF5 file handle
- flush()
Write all buffered data in the to the disk file.
- class westpa.core.h5io.Frames(coordinates, time, cell_lengths, cell_angles, velocities, kineticEnergy, potentialEnergy, temperature, alchemicalLambda)
Bases:
tuple
Create new instance of Frames(coordinates, time, cell_lengths, cell_angles, velocities, kineticEnergy, potentialEnergy, temperature, alchemicalLambda)
- alchemicalLambda
Alias for field number 8
- cell_angles
Alias for field number 3
- cell_lengths
Alias for field number 2
- coordinates
Alias for field number 0
- kineticEnergy
Alias for field number 5
- potentialEnergy
Alias for field number 6
- temperature
Alias for field number 7
- time
Alias for field number 1
- velocities
Alias for field number 4
- class westpa.core.h5io.WESTTrajectory(coordinates, topology=None, time=None, iter_labels=None, seg_labels=None, pcoords=None, parent_ids=None, unitcell_lengths=None, unitcell_angles=None)
Bases:
Trajectory
A subclass of
mdtraj.Trajectory
that contains the trajectory of atom coordinates with pointers denoting the iteration number and segment index of each frame.- iter_label_values()
- seg_label_values(iteration=None)
- property label_values
- property iter_labels
Iteration index corresponding to each frame
- Returns:
time – The iteration index corresponding to each frame
- Return type:
np.ndarray, shape=(n_frames,)
- property seg_labels
Segment index corresponding to each frame
- Returns:
time – The segment index corresponding to each frame
- Return type:
np.ndarray, shape=(n_frames,)
- property pcoords
- property parent_ids
- join(other, check_topology=True, discard_overlapping_frames=False)
Join two
Trajectory``s. This overrides ``mdtraj.Trajectory.join
so that it also handles WESTPA pointers.mdtraj.Trajectory.join
’s documentation for more details.
- slice(key, copy=True)
Slice the
Trajectory
. This overridesmdtraj.Trajectory.slice
so that it also handles WESTPA pointers. Please seemdtraj.Trajectory.slice
’s documentation for more details.
- westpa.core.h5io.resolve_filepath(path, constructor=<class 'h5py._hl.files.File'>, cargs=None, ckwargs=None, **addtlkwargs)
Use a combined filesystem and HDF5 path to open an HDF5 file and return the appropriate object. Returns (h5file, h5object). The file is opened using
constructor(filename, *cargs, **ckwargs)
.
- westpa.core.h5io.calc_chunksize(shape, dtype, max_chunksize=262144)
Calculate a chunk size for HDF5 data, anticipating that access will slice along lower dimensions sooner than higher dimensions.
- westpa.core.h5io.tostr(b)
Convert a nonstandard string object
b
to str with the handling of the case whereb
is bytes.
- westpa.core.h5io.is_within_directory(directory, target)
- westpa.core.h5io.safe_extract(tar, path='.', members=None, *, numeric_owner=False)
- westpa.core.h5io.create_hdf5_group(parent_group, groupname, replace=False, creating_program=None)
Create (or delete and recreate) and HDF5 group named
groupname
within the enclosing Group (object)parent_group
. Ifreplace
is True, then the group is replaced if present; if False, then an error is raised if the group is present. After the group is created, HDF5 attributes are set using stamp_creator_data.
- westpa.core.h5io.stamp_creator_data(h5group, creating_program=None)
Mark the following on the HDF5 group
h5group
:- creation_program:
The name of the program that created the group
- creation_user:
The username of the user who created the group
- creation_hostname:
The hostname of the machine on which the group was created
- creation_time:
The date and time at which the group was created, in the current locale.
- creation_unix_time:
The Unix time (seconds from the epoch, UTC) at which the group was created.
This is meant to facilitate tracking the flow of data, but should not be considered a secure paper trail (after all, anyone with write access to the HDF5 file can modify these attributes).
- westpa.core.h5io.get_creator_data(h5group)
Read back creator data as written by
stamp_creator_data
, returning a dictionary with keys as described forstamp_creator_data
. Missing fields are denoted with None. Thecreation_time
field is returned as a string.
- westpa.core.h5io.load_west(filename)
Load WESTPA trajectory files from disk.
- Parameters:
filename (str) – String filename of HDF Trajectory file.
- westpa.core.h5io.stamp_iter_range(h5object, start_iter, stop_iter)
Mark that the HDF5 object
h5object
(dataset or group) contains data from iterations start_iter <= n_iter < stop_iter.
- westpa.core.h5io.get_iter_range(h5object)
Read back iteration range data written by
stamp_iter_range
- westpa.core.h5io.stamp_iter_step(h5group, iter_step)
Mark that the HDF5 object
h5object
(dataset or group) contains data with an iteration step (stride) of iter_step).
- westpa.core.h5io.get_iter_step(h5group)
Read back iteration step (stride) written by
stamp_iter_step
- westpa.core.h5io.check_iter_range_least(h5object, iter_start, iter_stop)
Return True if the iteration range [iter_start, iter_stop) is the same as or entirely contained within the iteration range stored on
h5object
.
- westpa.core.h5io.check_iter_range_equal(h5object, iter_start, iter_stop)
Return True if the iteration range [iter_start, iter_stop) is the same as the iteration range stored on
h5object
.
- westpa.core.h5io.get_iteration_entry(h5object, n_iter)
Create a slice for data corresponding to iteration
n_iter
inh5object
.
- westpa.core.h5io.get_iteration_slice(h5object, iter_start, iter_stop=None, iter_stride=None)
Create a slice for data corresponding to iterations [iter_start,iter_stop), with stride iter_step, in the given
h5object
.
- westpa.core.h5io.label_axes(h5object, labels, units=None)
Stamp the given HDF5 object with axis labels. This stores the axis labels in an array of strings in an attribute called
axis_labels
on the given object.units
if provided is a corresponding list of units.
- class westpa.core.h5io.WESTPAH5File(*args, **kwargs)
Bases:
File
Generalized input/output for WESTPA simulation (or analysis) data.
Create a new file object.
See the h5py user guide for a detailed explanation of the options.
- name
Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.
- mode
r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise
- driver
Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.
- libver
Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’. The ‘v108’, ‘v110’ and ‘v112’ options can only be specified with the HDF5 1.10.2 library or later.
- userblock_size
Desired size of user block. Only allowed when creating a new file (mode w, w- or x).
- swmr
Open the file in SWMR read mode. Only used when mode = ‘r’.
- rdcc_nbytes
Total size of the dataset chunk cache in bytes. The default size is 1024**2 (1 MiB) per dataset. Applies to all datasets unless individually changed.
- rdcc_w0
The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75. Applies to all datasets unless individually changed.
- rdcc_nslots
The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521. Applies to all datasets unless individually changed.
- track_order
Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.
- fs_strategy
The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.
- fs_page_size
File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).
- fs_persist
A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.
- fs_threshold
The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.
- page_buf_size
Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.
- min_meta_keep
Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- min_raw_keep
Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- locking
The file locking behavior. Defined as:
False (or “false”) – Disable file locking
True (or “true”) – Enable file locking
“best-effort” – Enable file locking but ignore some errors
None – Use HDF5 defaults
Warning
The HDF5_USE_FILE_LOCKING environment variable can override this parameter.
Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.
- alignment_threshold
Together with
alignment_interval
, this property ensures that any file object greater than or equal in size to the alignment threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.- alignment_interval
This property should be used in conjunction with
alignment_threshold
. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT- meta_block_size
Set the current minimum size, in bytes, of new metadata block allocations. See https://portal.hdfgroup.org/display/HDF5/H5P_SET_META_BLOCK_SIZE
- Additional keywords
Passed on to the selected file driver.
- default_iter_prec = 8
- replace_dataset(*args, **kwargs)
- iter_object_name(n_iter, prefix='', suffix='')
Return a properly-formatted per-iteration name for iteration
n_iter
. (This is used in create/require/get_iter_group, but may also be useful for naming datasets on a per-iteration basis.)
- create_iter_group(n_iter, group=None)
Create a per-iteration data storage group for iteration number
n_iter
in the groupgroup
(which is ‘/iterations’ by default).
- require_iter_group(n_iter, group=None)
Ensure that a per-iteration data storage group for iteration number
n_iter
is available in the groupgroup
(which is ‘/iterations’ by default).
- get_iter_group(n_iter, group=None)
Get the per-iteration data group for iteration number
n_iter
from within the groupgroup
(‘/iterations’ by default).
- class westpa.core.h5io.WESTIterationFile(file, mode='r', force_overwrite=True, compression='zlib', link=None)
Bases:
HDF5TrajectoryFile
- read(frame_indices=None, atom_indices=None)
Read one or more frames of data from the file
- Parameters:
n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.
stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.
atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.
Notes
If you’d like more flexible access to the data, that is available by using the pytables group directly, which is accessible via the root property on this class.
- Returns:
frames – The returned namedtuple will have the fields “coordinates”, “time”, “cell_lengths”, “cell_angles”, “velocities”, “kineticEnergy”, “potentialEnergy”, “temperature” and “alchemicalLambda”. Each of the fields in the returned namedtuple will either be a numpy array or None, dependening on if that data was saved in the trajectory. All of the data shall be n units of “nanometers”, “picoseconds”, “kelvin”, “degrees” and “kilojoules_per_mole”.
- Return type:
namedtuple
- has_topology()
- has_pointer()
- has_restart(segment)
- write_data(where, name, data)
- read_data(where, name)
- read_as_traj(iteration=None, segment=None, atom_indices=None)
Read a trajectory from the HDF5 file
- Parameters:
n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.
stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.
atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.
- Returns:
trajectory – A trajectory object containing the loaded portion of the file.
- Return type:
- read_restart(segment)
- write_segment(segment, pop=False)
- class westpa.core.h5io.DSSpec
Bases:
object
Generalized WE dataset access
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- get_segment_data(n_iter, seg_id)
- class westpa.core.h5io.FileLinkedDSSpec(h5file_or_name)
Bases:
DSSpec
Provide facilities for accessing WESTPA HDF5 files, including auto-opening and the ability to pickle references to such files for transmission (through, e.g., the work manager), provided that the HDF5 file can be accessed by the same path on both the sender and receiver.
- property h5file
Lazily open HDF5 file. This is required because allowing an open HDF5 file to cross a fork() boundary generally corrupts the internal state of the HDF5 library.
- class westpa.core.h5io.SingleDSSpec(h5file_or_name, dsname, alias=None, slice=None)
Bases:
FileLinkedDSSpec
- classmethod from_string(dsspec_string, default_h5file)
- class westpa.core.h5io.SingleIterDSSpec(h5file_or_name, dsname, alias=None, slice=None)
Bases:
SingleDSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- class westpa.core.h5io.SingleSegmentDSSpec(h5file_or_name, dsname, alias=None, slice=None)
Bases:
SingleDSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- get_segment_data(n_iter, seg_id)
- class westpa.core.h5io.FnDSSpec(h5file_or_name, fn)
Bases:
FileLinkedDSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- class westpa.core.h5io.MultiDSSpec(dsspecs)
Bases:
DSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- class westpa.core.h5io.IterBlockedDataset(dataset_or_array, attrs=None)
Bases:
object
- classmethod empty_like(blocked_dataset)
- cache_data(max_size=None)
Cache this dataset in RAM. If
max_size
is given, then only cache if the entire dataset fits inmax_size
bytes. Ifmax_size
is the string ‘available’, then only cache if the entire dataset fits in available RAM, as defined by thepsutil
module.
- drop_cache()
- iter_entry(n_iter)
- iter_slice(start=None, stop=None)
westpa.core.progress module
- westpa.core.progress.linregress(x, y=None, alternative='two-sided')
Calculate a linear least-squares regression for two sets of measurements.
- Parameters:
x (array_like) – Two sets of measurements. Both arrays should have the same length. If only x is given (and
y=None
), then it must be a two-dimensional array where one dimension has length 2. The two sets of measurements are then found by splitting the array along the length-2 dimension. In the case wherey=None
and x is a 2x2 array,linregress(x)
is equivalent tolinregress(x[0], x[1])
.y (array_like) – Two sets of measurements. Both arrays should have the same length. If only x is given (and
y=None
), then it must be a two-dimensional array where one dimension has length 2. The two sets of measurements are then found by splitting the array along the length-2 dimension. In the case wherey=None
and x is a 2x2 array,linregress(x)
is equivalent tolinregress(x[0], x[1])
.alternative ({'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available:
’two-sided’: the slope of the regression line is nonzero
’less’: the slope of the regression line is less than zero
’greater’: the slope of the regression line is greater than zero
Added in version 1.7.0.
- Returns:
result – The return value is an object with the following attributes:
- slopefloat
Slope of the regression line.
- interceptfloat
Intercept of the regression line.
- rvaluefloat
The Pearson correlation coefficient. The square of
rvalue
is equal to the coefficient of determination.- pvaluefloat
The p-value for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic. See alternative above for alternative hypotheses.
- stderrfloat
Standard error of the estimated slope (gradient), under the assumption of residual normality.
- intercept_stderrfloat
Standard error of the estimated intercept, under the assumption of residual normality.
- Return type:
LinregressResult
instance
See also
scipy.optimize.curve_fit
Use non-linear least squares to fit a function to data.
scipy.optimize.leastsq
Minimize the sum of squares of a set of equations.
Notes
Missing values are considered pair-wise: if a value is missing in x, the corresponding value in y is masked.
For compatibility with older versions of SciPy, the return value acts like a
namedtuple
of length 5, with fieldsslope
,intercept
,rvalue
,pvalue
andstderr
, so one can continue to write:slope, intercept, r, p, se = linregress(x, y)
With that style, however, the standard error of the intercept is not available. To have access to all the computed values, including the standard error of the intercept, use the return value as an object with attributes, e.g.:
result = linregress(x, y) print(result.intercept, result.intercept_stderr)
Examples
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from scipy import stats >>> rng = np.random.default_rng()
Generate some data:
>>> x = rng.random(10) >>> y = 1.6*x + rng.random(10)
Perform the linear regression:
>>> res = stats.linregress(x, y)
Coefficient of determination (R-squared):
>>> print(f"R-squared: {res.rvalue**2:.6f}") R-squared: 0.717533
Plot the data along with the fitted line:
>>> plt.plot(x, y, 'o', label='original data') >>> plt.plot(x, res.intercept + res.slope*x, 'r', label='fitted line') >>> plt.legend() >>> plt.show()
Calculate 95% confidence interval on slope and intercept:
>>> # Two-sided inverse Students t-distribution >>> # p - probability, df - degrees of freedom >>> from scipy.stats import t >>> tinv = lambda p, df: abs(t.ppf(p/2, df))
>>> ts = tinv(0.05, len(x)-2) >>> print(f"slope (95%): {res.slope:.6f} +/- {ts*res.stderr:.6f}") slope (95%): 1.453392 +/- 0.743465 >>> print(f"intercept (95%): {res.intercept:.6f}" ... f" +/- {ts*res.intercept_stderr:.6f}") intercept (95%): 0.616950 +/- 0.544475
- westpa.core.progress.nop()
westpa.core.segment module
- class westpa.core.segment.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
westpa.core.sim_manager module
- class westpa.core.sim_manager.timedelta
Bases:
object
Difference between two datetime values.
timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
All arguments are optional and default to 0. Arguments may be integers or floats, and may be positive or negative.
- days
Number of days.
- max = datetime.timedelta(days=999999999, seconds=86399, microseconds=999999)
- microseconds
Number of microseconds (>= 0 and less than 1 second).
- min = datetime.timedelta(days=-999999999)
- resolution = datetime.timedelta(microseconds=1)
- seconds
Number of seconds (>= 0 and less than 1 day).
- total_seconds()
Total seconds in the duration.
- class westpa.core.sim_manager.zip_longest
Bases:
object
zip_longest(iter1 [,iter2 […]], [fillvalue=None]) –> zip_longest object
Return a zip_longest object whose .__next__() method returns a tuple where the i-th element comes from the i-th iterable argument. The .__next__() method continues until the longest iterable in the argument sequence is exhausted and then it raises StopIteration. When the shorter iterables are exhausted, the fillvalue is substituted in their place. The fillvalue defaults to None or can be specified by a keyword argument.
- exception westpa.core.sim_manager.PickleError
Bases:
Exception
- westpa.core.sim_manager.weight_dtype
alias of
float64
- class westpa.core.sim_manager.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.core.sim_manager.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- westpa.core.sim_manager.grouper(n, iterable, fillvalue=None)
Collect data into fixed-length chunks or blocks
- exception westpa.core.sim_manager.PropagationError
Bases:
RuntimeError
- class westpa.core.sim_manager.WESimManager(rc=None)
Bases:
object
- process_config()
- register_callback(hook, function, priority=0)
Registers a callback to execute during the given
hook
into the simulation loop. The optional priority is used to order when the function is called relative to other registered callbacks.
- invoke_callbacks(hook, *args, **kwargs)
- load_plugins(plugins=None)
- report_bin_statistics(bins, target_states, save_summary=False)
- get_bstate_pcoords(basis_states, label='basis')
For each of the given
basis_states
, calculate progress coordinate values as necessary. The HDF5 file is not updated.
- report_basis_states(basis_states, label='basis')
- report_target_states(target_states)
- initialize_simulation(basis_states, target_states, start_states, segs_per_state=1, suppress_we=False)
Initialize a new weighted ensemble simulation, taking
segs_per_state
initial states from each of the givenbasis_states
.w_init
is the forward-facing version of this function
- prepare_iteration()
- finalize_iteration()
Clean up after an iteration and prepare for the next.
- get_istate_futures()
Add
n_states
initial states to the internal list of initial states assigned to recycled particles. Spare states are used if available, otherwise new states are created. If created new initial states requires generation, then a set of futures is returned representing work manager tasks corresponding to the necessary generation work.
- propagate()
- save_bin_data()
Calculate and write flux and transition count matrices to HDF5. Population and rate matrices are likely useless at the single-tau level and are no longer written.
- check_propagation()
Check for failures in propagation or initial state generation, and raise an exception if any are found.
- run_we()
Run the weighted ensemble algorithm based on the binning in self.final_bins and the recycled particles in self.to_recycle, creating and committing the next iteration’s segments to storage as well.
- prepare_new_iteration()
Commit data for the coming iteration to the HDF5 file.
- run()
- prepare_run()
Prepare a new run.
- finalize_run()
Perform cleanup at the normal end of a run
- pre_propagation()
- post_propagation()
- pre_we()
- post_we()
westpa.core.states module
- class westpa.core.states.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.core.states.BasisState(label, probability, pcoord=None, auxref=None, state_id=None)
Bases:
object
Describes an basis (micro)state. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation (i.e. at w_init) or due to recycling.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
probability – Probability of this state to be selected when creating a new trajectory.
pcoord – The representative progress coordinate of this state.
auxref – A user-provided (string) reference for locating data associated with this state (usually a filesystem path).
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile)
Read a file defining basis states. Each line defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in:
unbound 1.0
or:
unbound_0 0.6 state0.pdb unbound_1 0.4 state1.pdb
- as_numpy_record()
Return the data for this state as a numpy record array.
- class westpa.core.states.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- class westpa.core.states.TargetState(label, pcoord, state_id=None)
Bases:
object
Describes a target state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
label – A descriptive label for this microstate (may be empty)
pcoord – The representative progress coordinate of this state.
- classmethod states_to_file(states, fileobj)
Write a file defining basis states, which may then be read by states_from_file().
- classmethod states_from_file(statefile, dtype)
Read a file defining target states. Each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in:
bound 0.02
for a single target and one-dimensional progress coordinates or:
bound 2.7 0.0 drift 100 50.0
for two targets and a two-dimensional progress coordinate.
- westpa.core.states.pare_basis_initial_states(basis_states, initial_states, segments=None)
Given iterables of basis and initial states (and optionally segments that use them), return minimal sets (as in __builtins__.set) of states needed to describe the history of the given segments an initial states.
- westpa.core.states.return_state_type(state_obj)
Convinience function for returning the state ID and type of the state_obj pointer
westpa.core.systems module
- class westpa.core.systems.NopMapper
Bases:
BinMapper
Put everything into one bin.
- assign(coords, mask=None, output=None)
- class westpa.core.systems.WESTSystem(rc=None)
Bases:
object
A description of the system being simulated, including the dimensionality and data type of the progress coordinate, the number of progress coordinate entries expected from each segment, and binning. To construct a simulation, the user must subclass WESTSystem and set several instance variables.
At a minimum, the user must subclass
WESTSystem
and override :method:`initialize` to set the data type and dimensionality of progress coordinate data and define a bin mapper.- Variables:
pcoord_ndim – The number of dimensions in the progress coordinate. Defaults to 1 (i.e. a one-dimensional progress coordinate).
pcoord_dtype – The data type of the progress coordinate, which must be callable (e.g.
np.float32
andlong
will work, but'<f4'
and'<i8'
will not). Defaults tonp.float64
.pcoord_len – The length of the progress coordinate time series generated by each segment, including both the initial and final values. Defaults to 2 (i.e. only the initial and final progress coordinate values for a segment are returned from propagation).
bin_mapper – A bin mapper describing the progress coordinate space.
bin_target_counts – A vector of target counts, one per bin.
- property bin_target_counts
- initialize()
Prepare this system object for use in simulation or analysis, creating a bin space, setting replicas per bin, and so on. This function is called whenever a WEST tool creates an instance of the system driver.
- prepare_run()
Prepare this system for use in a simulation run. Called by w_run in all worker processes.
- finalize_run()
A hook for system-specific processing for the end of a simulation run (as defined by such things as maximum wallclock time, rather than perhaps more scientifically-significant definitions of “the end of a simulation run”)
- new_pcoord_array(pcoord_len=None)
Return an appropriately-sized and -typed pcoord array for a timepoint, segment, or number of segments. If
pcoord_len
is not specified (or None), then a length appropriate for a segment is returned.
- new_region_set()
westpa.core.textio module
Miscellaneous routines to help with input and output of WEST-related data in text format
- class westpa.core.textio.NumericTextOutputFormatter(output_file, mode='wt', emit_header=None)
Bases:
object
- comment_string = '# '
- emit_header = True
- close()
- write(str)
- writelines(sequence)
- write_comment(line)
Writes a line beginning with the comment string
- write_header(line)
Appends a line to those written when the file header is written. The appropriate comment string will be prepended, so
line
should not include a comment character.
westpa.core.we_driver module
- class westpa.core.we_driver.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.core.we_driver.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)
Bases:
object
Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.
- Variables:
state_id – Integer identifier of this state, usually set by the data manager.
basis_state_id – Identifier of the basis state from which this state was generated, or None.
basis_state – The BasisState from which this state was generated, or None.
iter_created – Iteration in which this state was generated (0 for simulation initialization).
iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).
istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).
istate_status – Integer describing whether this initial state has been properly prepared.
pcoord – The representative progress coordinate of this state.
- ISTATE_TYPE_UNSET = 0
- ISTATE_TYPE_BASIS = 1
- ISTATE_TYPE_GENERATED = 2
- ISTATE_TYPE_RESTART = 3
- ISTATE_TYPE_START = 4
- ISTATE_UNUSED = 0
- ISTATE_STATUS_PENDING = 0
- ISTATE_STATUS_PREPARED = 1
- ISTATE_STATUS_FAILED = 2
- istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
- istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
- istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
- istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
- as_numpy_record()
- exception westpa.core.we_driver.ConsistencyError
Bases:
RuntimeError
- exception westpa.core.we_driver.AccuracyError
Bases:
RuntimeError
- class westpa.core.we_driver.NewWeightEntry(source_type, weight, prev_seg_id=None, prev_init_pcoord=None, prev_final_pcoord=None, new_init_pcoord=None, target_state_id=None, initial_state_id=None)
Bases:
object
- NW_SOURCE_RECYCLED = 0
- class westpa.core.we_driver.WEDriver(rc=None, system=None)
Bases:
object
A class implemented Huber & Kim’s weighted ensemble algorithm over Segment objects. This class handles all binning, recycling, and preparation of new Segment objects for the next iteration. Binning is accomplished using system.bin_mapper, and per-bin target counts are from system.bin_target_counts.
The workflow is as follows:
Call new_iteration() every new iteration, providing any recycling targets that are in force and any available initial states for recycling.
Call assign() to assign segments to bins based on their initial and end points. This returns the number of walkers that were recycled.
Call run_we(), optionally providing a set of initial states that will be used to recycle walkers.
Note the presence of flux_matrix, transition_matrix, current_iter_segments, next_iter_segments, recycling_segments, initial_binning, final_binning, next_iter_binning, and new_weights (to be documented soon).
- weight_split_threshold = 2.0
- weight_merge_cutoff = 1.0
- largest_allowed_weight = 1.0
- smallest_allowed_weight = 1e-310
- process_config()
- property next_iter_segments
Newly-created segments for the next iteration
- property current_iter_segments
Segments for the current iteration
- property next_iter_assignments
Bin assignments (indices) for initial points of next iteration.
- property current_iter_assignments
Bin assignments (indices) for endpoints of current iteration.
- property recycling_segments
Segments designated for recycling
- property n_recycled_segs
Number of segments recycled this iteration
- property n_istates_needed
Number of initial states needed to support recycling for this iteration
- check_threshold_configs()
Check to see if weight thresholds parameters are valid
- clear()
Explicitly delete all Segment-related state.
- new_iteration(initial_states=None, target_states=None, new_weights=None, bin_mapper=None, bin_target_counts=None)
Prepare for a new iteration.
initial_states
is a sequence of all InitialState objects valid for use in to generating new segments for the next iteration (after the one being begun with the call to new_iteration); that is, these are states available to recycle to. Target states which generate recycling events are specified intarget_states
, a sequence of TargetState objects. Bothinitial_states
andtarget_states
may be empty as required.The optional
new_weights
is a sequence of NewWeightEntry objects which will be used to construct the initial flux matrix.The given
bin_mapper
will be used for assignment, andbin_target_counts
used for splitting/merging target counts; each will be obtained from the system object if omitted or None.
- add_initial_states(initial_states)
Add newly-prepared initial states to the pool available for recycling.
- property all_initial_states
Return an iterator over all initial states (available or used)
- assign(segments, initializing=False)
Assign segments to initial and final bins, and update the (internal) lists of used and available initial states. If
initializing
is True, then the “final” bin assignments will be identical to the initial bin assignments, a condition required for seeding a new iteration from pre-existing segments.
- populate_initial(initial_states, weights, system=None)
Create walkers for a new weighted ensemble simulation.
One segment is created for each provided initial state, then binned and split/merged as necessary. After this function is called, next_iter_segments will yield the new segments to create, used_initial_states will contain data about which of the provided initial states were used, and avail_initial_states will contain data about which initial states were unused (because their corresponding walkers were merged out of existence).
- rebin_current(parent_segments)
Reconstruct walkers for the current iteration based on (presumably) new binning. The previous iteration’s segments must be provided (as
parent_segments
) in order to update endpoint types appropriately.
- construct_next()
Construct walkers for the next iteration, by running weighted ensemble recycling and bin/split/merge on the segments previously assigned to bins using
assign
. Enough unused initial states must be present inself.avail_initial_states
for every recycled walker to be assigned an initial state.After this function completes,
self.flux_matrix
contains a valid flux matrix for this iteration (including any contributions from recycling from the previous iteration), andself.next_iter_segments
contains a list of segments ready for the next iteration, with appropriate values set for weight, endpoint type, parent walkers, and so on.
westpa.core.wm_ops module
- westpa.core.wm_ops.get_pcoord(state)
- westpa.core.wm_ops.gen_istate(basis_state, initial_state)
- westpa.core.wm_ops.prep_iter(n_iter, segments)
- westpa.core.wm_ops.post_iter(n_iter, segments)
- westpa.core.wm_ops.propagate(basis_states, initial_states, segments)
westpa.core.yamlcfg module
YAML-based configuration files for WESTPA
- westpa.core.yamlcfg.YLoader
alias of
CLoader
- class westpa.core.yamlcfg.NopMapper
Bases:
BinMapper
Put everything into one bin.
- assign(coords, mask=None, output=None)
- exception westpa.core.yamlcfg.ConfigValueWarning
Bases:
UserWarning
- westpa.core.yamlcfg.warn_dubious_config_entry(entry, value, expected_type=None, category=<class 'westpa.core.yamlcfg.ConfigValueWarning'>, stacklevel=1)
- westpa.core.yamlcfg.check_bool(value, action='warn')
Check that the given
value
is boolean in type. If not, either raise a warning (ifaction=='warn'
) or an exception (action=='raise'
).
- exception westpa.core.yamlcfg.ConfigItemMissing(key, message=None)
Bases:
KeyError
- exception westpa.core.yamlcfg.ConfigItemTypeError(key, expected_type, message=None)
Bases:
TypeError
- exception westpa.core.yamlcfg.ConfigValueError(key, value, message=None)
Bases:
ValueError
- class westpa.core.yamlcfg.YAMLConfig
Bases:
object
- preload_config_files = ['/etc/westpa/westrc', '/home/docs/.westrc']
- update_from_file(file, required=True)
- require(key, type_=None)
Ensure that a configuration item with the given
key
is present. If the optionaltype_
is given, additionally require that the item has that type.
- require_type_if_present(key, type_)
Ensure that the configuration item with the given
key
has the given type.
- coerce_type_if_present(key, type_)
- get(key, default=None)
- get_typed(key, type_, default=<object object>)
- get_path(key, default=<object object>, expandvars=True, expanduser=True, realpath=True, abspath=True)
- get_pathlist(key, default=<object object>, sep=':', expandvars=True, expanduser=True, realpath=True, abspath=True)
- get_python_object(key, default=<object object>, path=None)
- get_choice(key, choices, default=<object object>, value_transform=None)
- class westpa.core.yamlcfg.YAMLSystem(rc=None)
Bases:
object
A description of the system being simulated, including the dimensionality and data type of the progress coordinate, the number of progress coordinate entries expected from each segment, and binning. To construct a simulation, the user must subclass WESTSystem and set several instance variables.
At a minimum, the user must subclass
WESTSystem
and override :method:`initialize` to set the data type and dimensionality of progress coordinate data and define a bin mapper.- Variables:
pcoord_ndim – The number of dimensions in the progress coordinate. Defaults to 1 (i.e. a one-dimensional progress coordinate).
pcoord_dtype – The data type of the progress coordinate, which must be callable (e.g.
np.float32
andlong
will work, but'<f4'
and'<i8'
will not). Defaults tonp.float64
.pcoord_len – The length of the progress coordinate time series generated by each segment, including both the initial and final values. Defaults to 2 (i.e. only the initial and final progress coordinate values for a segment are returned from propagation).
bin_mapper – A bin mapper describing the progress coordinate space.
bin_target_counts – A vector of target counts, one per bin.
- property bin_target_counts
- initialize()
Prepare this system object for use in simulation or analysis, creating a bin space, setting replicas per bin, and so on. This function is called whenever a WEST tool creates an instance of the system driver.
- prepare_run()
Prepare this system for use in a simulation run. Called by w_run in all worker processes.
- finalize_run()
A hook for system-specific processing for the end of a simulation run (as defined by such things as maximum wallclock time, rather than perhaps more scientifically-significant definitions of “the end of a simulation run”)
- new_pcoord_array(pcoord_len=None)
Return an appropriately-sized and -typed pcoord array for a timepoint, segment, or number of segments. If
pcoord_len
is not specified (or None), then a length appropriate for a segment is returned.
- new_region_set()
westpa.work_managers package
westpa.work_managers package
westpa.work_managers module
A system for parallel, remote execution of multiple arbitrary tasks.
Much of this, both in concept and execution, was inspired by (and in some
cases based heavily on) the concurrent.futures
package from Python 3.2,
with some simplifications and adaptations (thanks to Brian Quinlan and his
futures implementation).
- class westpa.work_managers.SerialWorkManager
Bases:
WorkManager
- classmethod from_environ(wmenv=None)
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- class westpa.work_managers.ThreadsWorkManager(n_workers=None)
Bases:
WorkManager
A work manager using threads.
- classmethod from_environ(wmenv=None)
- runtask(task_queue)
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- class westpa.work_managers.ProcessWorkManager(n_workers=None, shutdown_timeout=1)
Bases:
WorkManager
A work manager using the
multiprocessing
module.Notes
On MacOS, as of Python 3.8 the default start method for multiprocessing launching new processes was changed from fork to spawn. In general, spawn is more robust and efficient, however it requires serializability of everything being passed to the child process. In contrast, fork is much less memory efficient, as it makes a full copy of everything in the parent process. However, it does not require picklability.
So, on MacOS, the method for launching new processes is explicitly changed to fork from the (MacOS-specific) default of spawn. Unix should default to fork.
See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods and https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods for more details.
- classmethod from_environ(wmenv=None)
- task_loop()
- results_loop()
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- westpa.work_managers.make_work_manager()
Using cues from the environment, instantiate a pre-configured work manager.
westpa.work_managers.core module
- class westpa.work_managers.core.islice
Bases:
object
islice(iterable, stop) –> islice object islice(iterable, start, stop[, step]) –> islice object
Return an iterator whose next() method returns selected values from an iterable. If start is specified, will skip all preceding elements; otherwise, start defaults to zero. Step defaults to one. If specified as another value, step determines how many values are skipped between successive calls. Works like a slice() on a list but returns an iterator.
- westpa.work_managers.core.contextmanager(func)
@contextmanager decorator.
Typical usage:
@contextmanager def some_generator(<arguments>):
<setup> try:
yield <value>
- finally:
<cleanup>
This makes this:
- with some_generator(<arguments>) as <variable>:
<body>
equivalent to this:
<setup> try:
<variable> = <value> <body>
- finally:
<cleanup>
- class westpa.work_managers.core.WorkManager
Bases:
object
Base class for all work managers. At a minimum, work managers must provide a
submit()
function and an_workers
attribute (which may be a property), though most will also overridestartup()
andshutdown()
.- classmethod from_environ(wmenv=None)
- classmethod add_wm_args(parser, wmenv=None)
- sigint_handler(signum, frame)
- install_sigint_handler()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- run()
Run the worker loop (in clients only).
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- as_completed(futures)
Return a generator which yields results from the given
futures
as they become available.
- submit_as_completed(task_generator, queue_size=None)
Return a generator which yields results from a set of
futures
as they become available. Futures are generated by thetask_generator
, which must return a triple of the form expected bysubmit
. The method also accepts an intqueue_size
that dictates the maximum number of Futures that should be pending at any given time. The default value ofNone
submits all of the tasks at once.
- wait_any(futures)
Wait on any of the given
futures
and return the first one which has a result available. If more than one result is or becomes available simultaneously, any completed future may be returned.
- wait_all(futures)
A convenience function which waits on all the given
futures
in order. This function returns the samefutures
as submitted to the function as a list, indicating the order in which waits occurred.
- property is_master
True if this is the master process for task distribution. This is necessary, e.g., for MPI, where all processes start identically and then must branch depending on rank.
- class westpa.work_managers.core.FutureWatcher(futures, threshold=1)
Bases:
object
A device to wait on multiple results and/or exceptions with only one lock.
- signal(future)
Signal this watcher that the given future has results available. If this brings the number of available futures above signal_threshold, this watcher’s event object will be signalled as well.
- wait()
Wait on one or more futures.
- reset()
Reset this watcher’s list of completed futures, returning the list of completed futures prior to resetting it.
- add(futures)
Add watchers to all futures in the iterable of futures.
- class westpa.work_managers.core.WMFuture(task_id=None)
Bases:
object
A “future”, representing work which has been dispatched for completion asynchronously.
- static all_acquired(futures)
Context manager to acquire all locks on the given
futures
. Primarily for internal use.
- get_result(discard=True)
Get the result associated with this future, blocking until it is available. If
discard
is true, then removes the reference to the result contained in this instance, so that a collection of futures need not turn into a cache of all associated results.
- property result
- wait()
Wait until this future has a result or exception available.
- get_exception()
Get the exception associated with this future, blocking until it is available.
- property exception
Get the exception associated with this future, blocking until it is available.
- get_traceback()
Get the traceback object associated with this future, if any.
- property traceback
Get the traceback object associated with this future, if any.
- is_done()
Indicates whether this future is done executing (may block if this future is being updated).
- property done
Indicates whether this future is done executing (may block if this future is being updated).
westpa.work_managers.environment module
Routines for configuring the work manager environment
- class westpa.work_managers.environment.WMEnvironment(use_arg_prefixes=False, valid_work_managers=None)
Bases:
object
A class to encapsulate the environment in which work managers are instantiated; this controls how environment variables and command-line arguments are used to set up work managers. This could be used to cleanly instantiate two work managers within one application, but is really more about providing facilities to make it easier for individual work managers to configure themselves according to precendence of configuration information:
command-line arguments
environment variables
defaults
- group_title = 'parallelization options'
- group_description = None
- env_prefix = 'WM'
- arg_prefix = 'wm'
- default_work_manager = 'serial'
- default_parallel_work_manager = 'processes'
- valid_work_managers = ['serial', 'threads', 'processes', 'zmq', 'mpi']
- env_name(name)
- arg_name(name)
- arg_flag(name)
- get_val(name, default=None, type_=None)
- add_wm_args(parser)
- process_wm_args(args)
- make_work_manager()
Using cues from the environment, instantiate a pre-configured work manager.
- westpa.work_managers.environment.make_work_manager()
Using cues from the environment, instantiate a pre-configured work manager.
- westpa.work_managers.environment.add_wm_args(parser)
- westpa.work_managers.environment.process_wm_args(args)
westpa.work_managers.mpi module
A work manager which uses MPI to distribute tasks and collect results.
- class westpa.work_managers.mpi.deque
Bases:
object
deque([iterable[, maxlen]]) –> deque object
A list-like sequence optimized for data accesses near its endpoints.
- append()
Add an element to the right side of the deque.
- appendleft()
Add an element to the left side of the deque.
- clear()
Remove all elements from the deque.
- copy()
Return a shallow copy of a deque.
- count()
D.count(value) – return number of occurrences of value
- extend()
Extend the right side of the deque with elements from the iterable
- extendleft()
Extend the left side of the deque with elements from the iterable
- index()
D.index(value, [start, [stop]]) – return first index of value. Raises ValueError if the value is not present.
- insert()
D.insert(index, object) – insert object before index
- maxlen
maximum size of a deque or None if unbounded
- pop()
Remove and return the rightmost element.
- popleft()
Remove and return the leftmost element.
- remove()
D.remove(value) – remove first occurrence of value.
- reverse()
D.reverse() – reverse IN PLACE
- rotate()
Rotate the deque n steps to the right (default n=1). If n is negative, rotates left.
- class westpa.work_managers.mpi.WorkManager
Bases:
object
Base class for all work managers. At a minimum, work managers must provide a
submit()
function and an_workers
attribute (which may be a property), though most will also overridestartup()
andshutdown()
.- classmethod from_environ(wmenv=None)
- classmethod add_wm_args(parser, wmenv=None)
- sigint_handler(signum, frame)
- install_sigint_handler()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- run()
Run the worker loop (in clients only).
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- as_completed(futures)
Return a generator which yields results from the given
futures
as they become available.
- submit_as_completed(task_generator, queue_size=None)
Return a generator which yields results from a set of
futures
as they become available. Futures are generated by thetask_generator
, which must return a triple of the form expected bysubmit
. The method also accepts an intqueue_size
that dictates the maximum number of Futures that should be pending at any given time. The default value ofNone
submits all of the tasks at once.
- wait_any(futures)
Wait on any of the given
futures
and return the first one which has a result available. If more than one result is or becomes available simultaneously, any completed future may be returned.
- wait_all(futures)
A convenience function which waits on all the given
futures
in order. This function returns the samefutures
as submitted to the function as a list, indicating the order in which waits occurred.
- property is_master
True if this is the master process for task distribution. This is necessary, e.g., for MPI, where all processes start identically and then must branch depending on rank.
- class westpa.work_managers.mpi.WMFuture(task_id=None)
Bases:
object
A “future”, representing work which has been dispatched for completion asynchronously.
- static all_acquired(futures)
Context manager to acquire all locks on the given
futures
. Primarily for internal use.
- get_result(discard=True)
Get the result associated with this future, blocking until it is available. If
discard
is true, then removes the reference to the result contained in this instance, so that a collection of futures need not turn into a cache of all associated results.
- property result
- wait()
Wait until this future has a result or exception available.
- get_exception()
Get the exception associated with this future, blocking until it is available.
- property exception
Get the exception associated with this future, blocking until it is available.
- get_traceback()
Get the traceback object associated with this future, if any.
- property traceback
Get the traceback object associated with this future, if any.
- is_done()
Indicates whether this future is done executing (may block if this future is being updated).
- property done
Indicates whether this future is done executing (may block if this future is being updated).
- class westpa.work_managers.mpi.Task(task_id, fn, args, kwargs)
Bases:
object
Tasks are tuples of (task_id, function, args, keyword args)
- class westpa.work_managers.mpi.MPIWorkManager
Bases:
WorkManager
MPIWorkManager factory.
Initialize info shared by Manager and Worker classes.
- classmethod from_environ(wmenv=None)
- submit(fn, args=None, kwargs=None)
Adhere to WorkManager interface. This method should never be called.
- class westpa.work_managers.mpi.Serial
Bases:
MPIWorkManager
Replication of the serial work manager. This is a fallback for MPI runs that request only 1 (size=1) processor.
Initialize info shared by Manager and Worker classes.
- submit(fn, args=None, kwargs=None)
Adhere to WorkManager interface. This method should never be called.
- class westpa.work_managers.mpi.Manager
Bases:
MPIWorkManager
Manager of the MPIWorkManage. Distributes tasks to Worker as they are received from the sim_manager. In addition to the main thread, this class spawns two threads, a receiver and a dispatcher.
Initialize different state variables used by Manager.
- startup()
Spawns the dispatcher and receiver threads.
- submit(fn, args=None, kwargs=None)
Receive task from simulation manager and add it to pending_futures.
- shutdown()
Send shutdown tag to all worker processes, and set the shutdown sentinel to stop the receiver and dispatcher loops.
- class westpa.work_managers.mpi.Worker
Bases:
MPIWorkManager
Client class for executing tasks as distributed by the Manager in the MPI Work Manager
Initialize info shared by Manager and Worker classes.
- startup()
Clock the worker in for work.
- clockIn()
Do each task as it comes in. The completion of a task is notice to the manager that more work is welcome.
- property is_master
Worker processes need to be marked as not manager. This ensures that the proper branching is followed in w_run.py.
westpa.work_managers.processes module
- exception westpa.work_managers.processes.Empty
Bases:
Exception
Exception raised by Queue.get(block=0)/get_nowait().
- class westpa.work_managers.processes.WorkManager
Bases:
object
Base class for all work managers. At a minimum, work managers must provide a
submit()
function and an_workers
attribute (which may be a property), though most will also overridestartup()
andshutdown()
.- classmethod from_environ(wmenv=None)
- classmethod add_wm_args(parser, wmenv=None)
- sigint_handler(signum, frame)
- install_sigint_handler()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- run()
Run the worker loop (in clients only).
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- as_completed(futures)
Return a generator which yields results from the given
futures
as they become available.
- submit_as_completed(task_generator, queue_size=None)
Return a generator which yields results from a set of
futures
as they become available. Futures are generated by thetask_generator
, which must return a triple of the form expected bysubmit
. The method also accepts an intqueue_size
that dictates the maximum number of Futures that should be pending at any given time. The default value ofNone
submits all of the tasks at once.
- wait_any(futures)
Wait on any of the given
futures
and return the first one which has a result available. If more than one result is or becomes available simultaneously, any completed future may be returned.
- wait_all(futures)
A convenience function which waits on all the given
futures
in order. This function returns the samefutures
as submitted to the function as a list, indicating the order in which waits occurred.
- property is_master
True if this is the master process for task distribution. This is necessary, e.g., for MPI, where all processes start identically and then must branch depending on rank.
- class westpa.work_managers.processes.WMFuture(task_id=None)
Bases:
object
A “future”, representing work which has been dispatched for completion asynchronously.
- static all_acquired(futures)
Context manager to acquire all locks on the given
futures
. Primarily for internal use.
- get_result(discard=True)
Get the result associated with this future, blocking until it is available. If
discard
is true, then removes the reference to the result contained in this instance, so that a collection of futures need not turn into a cache of all associated results.
- property result
- wait()
Wait until this future has a result or exception available.
- get_exception()
Get the exception associated with this future, blocking until it is available.
- property exception
Get the exception associated with this future, blocking until it is available.
- get_traceback()
Get the traceback object associated with this future, if any.
- property traceback
Get the traceback object associated with this future, if any.
- is_done()
Indicates whether this future is done executing (may block if this future is being updated).
- property done
Indicates whether this future is done executing (may block if this future is being updated).
- class westpa.work_managers.processes.ProcessWorkManager(n_workers=None, shutdown_timeout=1)
Bases:
WorkManager
A work manager using the
multiprocessing
module.Notes
On MacOS, as of Python 3.8 the default start method for multiprocessing launching new processes was changed from fork to spawn. In general, spawn is more robust and efficient, however it requires serializability of everything being passed to the child process. In contrast, fork is much less memory efficient, as it makes a full copy of everything in the parent process. However, it does not require picklability.
So, on MacOS, the method for launching new processes is explicitly changed to fork from the (MacOS-specific) default of spawn. Unix should default to fork.
See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods and https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods for more details.
- classmethod from_environ(wmenv=None)
- task_loop()
- results_loop()
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
westpa.work_managers.serial module
- class westpa.work_managers.serial.WorkManager
Bases:
object
Base class for all work managers. At a minimum, work managers must provide a
submit()
function and an_workers
attribute (which may be a property), though most will also overridestartup()
andshutdown()
.- classmethod from_environ(wmenv=None)
- classmethod add_wm_args(parser, wmenv=None)
- sigint_handler(signum, frame)
- install_sigint_handler()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- run()
Run the worker loop (in clients only).
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- as_completed(futures)
Return a generator which yields results from the given
futures
as they become available.
- submit_as_completed(task_generator, queue_size=None)
Return a generator which yields results from a set of
futures
as they become available. Futures are generated by thetask_generator
, which must return a triple of the form expected bysubmit
. The method also accepts an intqueue_size
that dictates the maximum number of Futures that should be pending at any given time. The default value ofNone
submits all of the tasks at once.
- wait_any(futures)
Wait on any of the given
futures
and return the first one which has a result available. If more than one result is or becomes available simultaneously, any completed future may be returned.
- wait_all(futures)
A convenience function which waits on all the given
futures
in order. This function returns the samefutures
as submitted to the function as a list, indicating the order in which waits occurred.
- property is_master
True if this is the master process for task distribution. This is necessary, e.g., for MPI, where all processes start identically and then must branch depending on rank.
- class westpa.work_managers.serial.WMFuture(task_id=None)
Bases:
object
A “future”, representing work which has been dispatched for completion asynchronously.
- static all_acquired(futures)
Context manager to acquire all locks on the given
futures
. Primarily for internal use.
- get_result(discard=True)
Get the result associated with this future, blocking until it is available. If
discard
is true, then removes the reference to the result contained in this instance, so that a collection of futures need not turn into a cache of all associated results.
- property result
- wait()
Wait until this future has a result or exception available.
- get_exception()
Get the exception associated with this future, blocking until it is available.
- property exception
Get the exception associated with this future, blocking until it is available.
- get_traceback()
Get the traceback object associated with this future, if any.
- property traceback
Get the traceback object associated with this future, if any.
- is_done()
Indicates whether this future is done executing (may block if this future is being updated).
- property done
Indicates whether this future is done executing (may block if this future is being updated).
- class westpa.work_managers.serial.SerialWorkManager
Bases:
WorkManager
- classmethod from_environ(wmenv=None)
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
westpa.work_managers.threads module
- class westpa.work_managers.threads.WorkManager
Bases:
object
Base class for all work managers. At a minimum, work managers must provide a
submit()
function and an_workers
attribute (which may be a property), though most will also overridestartup()
andshutdown()
.- classmethod from_environ(wmenv=None)
- classmethod add_wm_args(parser, wmenv=None)
- sigint_handler(signum, frame)
- install_sigint_handler()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- run()
Run the worker loop (in clients only).
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- as_completed(futures)
Return a generator which yields results from the given
futures
as they become available.
- submit_as_completed(task_generator, queue_size=None)
Return a generator which yields results from a set of
futures
as they become available. Futures are generated by thetask_generator
, which must return a triple of the form expected bysubmit
. The method also accepts an intqueue_size
that dictates the maximum number of Futures that should be pending at any given time. The default value ofNone
submits all of the tasks at once.
- wait_any(futures)
Wait on any of the given
futures
and return the first one which has a result available. If more than one result is or becomes available simultaneously, any completed future may be returned.
- wait_all(futures)
A convenience function which waits on all the given
futures
in order. This function returns the samefutures
as submitted to the function as a list, indicating the order in which waits occurred.
- property is_master
True if this is the master process for task distribution. This is necessary, e.g., for MPI, where all processes start identically and then must branch depending on rank.
- class westpa.work_managers.threads.WMFuture(task_id=None)
Bases:
object
A “future”, representing work which has been dispatched for completion asynchronously.
- static all_acquired(futures)
Context manager to acquire all locks on the given
futures
. Primarily for internal use.
- get_result(discard=True)
Get the result associated with this future, blocking until it is available. If
discard
is true, then removes the reference to the result contained in this instance, so that a collection of futures need not turn into a cache of all associated results.
- property result
- wait()
Wait until this future has a result or exception available.
- get_exception()
Get the exception associated with this future, blocking until it is available.
- property exception
Get the exception associated with this future, blocking until it is available.
- get_traceback()
Get the traceback object associated with this future, if any.
- property traceback
Get the traceback object associated with this future, if any.
- is_done()
Indicates whether this future is done executing (may block if this future is being updated).
- property done
Indicates whether this future is done executing (may block if this future is being updated).
- class westpa.work_managers.threads.ThreadsWorkManager(n_workers=None)
Bases:
WorkManager
A work manager using threads.
- classmethod from_environ(wmenv=None)
- runtask(task_queue)
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
westpa.work_managers.zeromq package
westpa.work_managers.zeromq module
- exception westpa.work_managers.zeromq.ZMQWMError
Bases:
RuntimeError
Base class for errors related to the ZeroMQ work manager itself
- exception westpa.work_managers.zeromq.ZMQWMTimeout
Bases:
ZMQWMEnvironmentError
A timeout of a sort that indicatess that a master or worker has failed or never started.
- exception westpa.work_managers.zeromq.ZMQWMEnvironmentError
Bases:
ZMQWMError
Class representing an error in the environment in which the ZeroMQ work manager is running. This includes such things as master/worker ID mismatches.
- exception westpa.work_managers.zeromq.ZMQWorkerMissing
Bases:
ZMQWMError
Exception representing that a worker processing a task died or disappeared
- class westpa.work_managers.zeromq.ZMQCore
Bases:
object
- PROTOCOL_MAJOR = 3
- PROTOCOL_MINOR = 0
- PROTOCOL_UPDATE = 0
- PROTOCOL_VERSION = (3, 0, 0)
- internal_transport = 'ipc'
- default_comm_mode = 'ipc'
- default_master_heartbeat = 20.0
- default_worker_heartbeat = 20.0
- default_timeout_factor = 5.0
- default_startup_timeout = 120.0
- default_shutdown_timeout = 5.0
- classmethod make_ipc_endpoint()
- classmethod remove_ipc_endpoints()
- classmethod make_tcp_endpoint(address='127.0.0.1')
- classmethod make_internal_endpoint()
- get_identification()
- validate_message(message)
Validate incoming message. Raises an exception if the message is improperly formatted (TypeError) or does not correspond to the appropriate master (ZMQWMEnvironmentError).
- message_validation(msg)
A context manager for message validation. The instance variable
validation_fail_action
controls the behavior of this context manager:‘raise’: re-raise the exception that indicated failed validation. Useful for development.
‘exit’ (default): report the error and exit the program.
‘warn’: report the error and continue.
- recv_message(socket, flags=0, validate=True, timeout=None)
Receive a message object from the given socket, using the given flags. Message validation is performed if
validate
is true. Iftimeout
is given, then it is the number of milliseconds to wait prior to raising a ZMQWMTimeout exception.timeout
is ignored ifflags
includeszmq.NOBLOCK
.
- recv_all(socket, flags=0, validate=True)
Receive all messages currently available from the given socket.
- recv_ack(socket, flags=0, validate=True, timeout=None)
- send_message(socket, message, payload=None, flags=0)
Send a message object. Subclasses may override this to decorate the message with appropriate IDs, then delegate upward to actually send the message.
message
may either be a pre-constructedMessage
object or a message identifier, in which (latter) casepayload
will become the message payload.payload
is ignored ifmessage
is aMessage
object.
- send_reply(socket, original_message, reply='ok', payload=None, flags=0)
Send a reply to
original_message
onsocket
. The reply message is a Message object or a message identifier. The reply master_id and worker_id are set fromoriginal_message
, unless master_id is not set, in which case it is set from self.master_id.
- send_ack(socket, original_message)
Send an acknowledgement message, which is mostly just to respect REQ/REP recv/send patterns.
- send_nak(socket, original_message)
Send a negative acknowledgement message.
- send_inproc_message(message, payload=None, flags=0)
- signal_shutdown()
- shutdown_handler(signal=None, frame=None)
- install_signal_handlers(signals=None)
- install_sigint_handler()
- startup()
- shutdown()
- join()
- class westpa.work_managers.zeromq.ZMQNode(upstream_rr_endpoint, upstream_ann_endpoint, n_local_workers=None)
-
- run()
- property is_master
- comm_loop()
- startup()
- class westpa.work_managers.zeromq.ZMQWorker(rr_endpoint, ann_endpoint)
Bases:
ZMQCore
This is the outward facing worker component of the ZMQ work manager. This forms the interface to the master. This process cannot hang or crash due to an error in tasks it executes, so tasks are isolated in ZMQExecutor, which communicates with ZMQWorker via (what else?) ZeroMQ.
- property is_master
- update_master_info(msg)
- identify(rr_socket)
- request_task(rr_socket, task_socket)
- handle_reconfigure_timeout(msg, timers)
- handle_result(result_socket, rr_socket)
- comm_loop()
Master communication loop for the worker process.
- shutdown_executor()
- install_signal_handlers(signals=None)
- startup(process_index=None)
- class westpa.work_managers.zeromq.ZMQWorkManager(n_local_workers=1)
Bases:
ZMQCore
,WorkManager
,IsNode
- classmethod add_wm_args(parser, wmenv=None)
- classmethod from_environ(wmenv=None)
- classmethod read_host_info(filename)
- classmethod canonicalize_endpoint(endpoint, allow_wildcard_host=True)
- property n_workers
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- send_message(socket, message, payload=None, flags=0)
Send a message object. Subclasses may override this to decorate the message with appropriate IDs, then delegate upward to actually send the message.
message
may either be a pre-constructedMessage
object or a message identifier, in which (latter) casepayload
will become the message payload.payload
is ignored ifmessage
is aMessage
object.
- handle_result(socket, msg)
- handle_task_request(socket, msg)
- update_worker_information(msg)
- check_workers()
- remove_worker(worker_id)
- shutdown_clear_tasks()
Abort pending tasks with error on shutdown.
- comm_loop()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
westpa.work_managers.zeromq.core module
Created on May 29, 2015
@author: mzwier
- westpa.work_managers.zeromq.core.randport(address='127.0.0.1')
Select a random unused TCP port number on the given address.
- exception westpa.work_managers.zeromq.core.ZMQWMError
Bases:
RuntimeError
Base class for errors related to the ZeroMQ work manager itself
- exception westpa.work_managers.zeromq.core.ZMQWorkerMissing
Bases:
ZMQWMError
Exception representing that a worker processing a task died or disappeared
- exception westpa.work_managers.zeromq.core.ZMQWMEnvironmentError
Bases:
ZMQWMError
Class representing an error in the environment in which the ZeroMQ work manager is running. This includes such things as master/worker ID mismatches.
- exception westpa.work_managers.zeromq.core.ZMQWMTimeout
Bases:
ZMQWMEnvironmentError
A timeout of a sort that indicatess that a master or worker has failed or never started.
- class westpa.work_managers.zeromq.core.Message(message=None, payload=None, master_id=None, src_id=None)
Bases:
object
- SHUTDOWN = 'shutdown'
- ACK = 'ok'
- NAK = 'no'
- IDENTIFY = 'identify'
- TASKS_AVAILABLE = 'tasks_available'
- TASK_REQUEST = 'task_request'
- MASTER_BEACON = 'master_alive'
- RECONFIGURE_TIMEOUT = 'reconfigure_timeout'
- TASK = 'task'
- RESULT = 'result'
- idempotent_announcement_messages = {'master_alive', 'shutdown', 'tasks_available'}
- classmethod coalesce_announcements(messages)
- class westpa.work_managers.zeromq.core.Task(fn, args, kwargs, task_id=None)
Bases:
object
- execute()
Run this task, returning a Result object.
- class westpa.work_managers.zeromq.core.Result(task_id, result=None, exception=None, traceback=None)
Bases:
object
- class westpa.work_managers.zeromq.core.PassiveTimer(duration, started=None)
Bases:
object
- started
- duration
- property expired
- property expires_in
- reset(at=None)
- start(at=None)
- class westpa.work_managers.zeromq.core.PassiveMultiTimer
Bases:
object
- add_timer(identifier, duration)
- remove_timer(identifier)
- change_duration(identifier, duration)
- reset(identifier=None, at=None)
- expired(identifier, at=None)
- next_expiration()
- next_expiration_in()
- which_expired(at=None)
- class westpa.work_managers.zeromq.core.ZMQCore
Bases:
object
- PROTOCOL_MAJOR = 3
- PROTOCOL_MINOR = 0
- PROTOCOL_UPDATE = 0
- PROTOCOL_VERSION = (3, 0, 0)
- internal_transport = 'ipc'
- default_comm_mode = 'ipc'
- default_master_heartbeat = 20.0
- default_worker_heartbeat = 20.0
- default_timeout_factor = 5.0
- default_startup_timeout = 120.0
- default_shutdown_timeout = 5.0
- classmethod make_ipc_endpoint()
- classmethod remove_ipc_endpoints()
- classmethod make_tcp_endpoint(address='127.0.0.1')
- classmethod make_internal_endpoint()
- get_identification()
- validate_message(message)
Validate incoming message. Raises an exception if the message is improperly formatted (TypeError) or does not correspond to the appropriate master (ZMQWMEnvironmentError).
- message_validation(msg)
A context manager for message validation. The instance variable
validation_fail_action
controls the behavior of this context manager:‘raise’: re-raise the exception that indicated failed validation. Useful for development.
‘exit’ (default): report the error and exit the program.
‘warn’: report the error and continue.
- recv_message(socket, flags=0, validate=True, timeout=None)
Receive a message object from the given socket, using the given flags. Message validation is performed if
validate
is true. Iftimeout
is given, then it is the number of milliseconds to wait prior to raising a ZMQWMTimeout exception.timeout
is ignored ifflags
includeszmq.NOBLOCK
.
- recv_all(socket, flags=0, validate=True)
Receive all messages currently available from the given socket.
- recv_ack(socket, flags=0, validate=True, timeout=None)
- send_message(socket, message, payload=None, flags=0)
Send a message object. Subclasses may override this to decorate the message with appropriate IDs, then delegate upward to actually send the message.
message
may either be a pre-constructedMessage
object or a message identifier, in which (latter) casepayload
will become the message payload.payload
is ignored ifmessage
is aMessage
object.
- send_reply(socket, original_message, reply='ok', payload=None, flags=0)
Send a reply to
original_message
onsocket
. The reply message is a Message object or a message identifier. The reply master_id and worker_id are set fromoriginal_message
, unless master_id is not set, in which case it is set from self.master_id.
- send_ack(socket, original_message)
Send an acknowledgement message, which is mostly just to respect REQ/REP recv/send patterns.
- send_nak(socket, original_message)
Send a negative acknowledgement message.
- send_inproc_message(message, payload=None, flags=0)
- signal_shutdown()
- shutdown_handler(signal=None, frame=None)
- install_signal_handlers(signals=None)
- install_sigint_handler()
- startup()
- shutdown()
- join()
- westpa.work_managers.zeromq.core.shutdown_process(process, timeout=1.0)
westpa.work_managers.zeromq.node module
Created on Jun 11, 2015
@author: mzwier
- class westpa.work_managers.zeromq.node.ZMQCore
Bases:
object
- PROTOCOL_MAJOR = 3
- PROTOCOL_MINOR = 0
- PROTOCOL_UPDATE = 0
- PROTOCOL_VERSION = (3, 0, 0)
- internal_transport = 'ipc'
- default_comm_mode = 'ipc'
- default_master_heartbeat = 20.0
- default_worker_heartbeat = 20.0
- default_timeout_factor = 5.0
- default_startup_timeout = 120.0
- default_shutdown_timeout = 5.0
- classmethod make_ipc_endpoint()
- classmethod remove_ipc_endpoints()
- classmethod make_tcp_endpoint(address='127.0.0.1')
- classmethod make_internal_endpoint()
- get_identification()
- validate_message(message)
Validate incoming message. Raises an exception if the message is improperly formatted (TypeError) or does not correspond to the appropriate master (ZMQWMEnvironmentError).
- message_validation(msg)
A context manager for message validation. The instance variable
validation_fail_action
controls the behavior of this context manager:‘raise’: re-raise the exception that indicated failed validation. Useful for development.
‘exit’ (default): report the error and exit the program.
‘warn’: report the error and continue.
- recv_message(socket, flags=0, validate=True, timeout=None)
Receive a message object from the given socket, using the given flags. Message validation is performed if
validate
is true. Iftimeout
is given, then it is the number of milliseconds to wait prior to raising a ZMQWMTimeout exception.timeout
is ignored ifflags
includeszmq.NOBLOCK
.
- recv_all(socket, flags=0, validate=True)
Receive all messages currently available from the given socket.
- recv_ack(socket, flags=0, validate=True, timeout=None)
- send_message(socket, message, payload=None, flags=0)
Send a message object. Subclasses may override this to decorate the message with appropriate IDs, then delegate upward to actually send the message.
message
may either be a pre-constructedMessage
object or a message identifier, in which (latter) casepayload
will become the message payload.payload
is ignored ifmessage
is aMessage
object.
- send_reply(socket, original_message, reply='ok', payload=None, flags=0)
Send a reply to
original_message
onsocket
. The reply message is a Message object or a message identifier. The reply master_id and worker_id are set fromoriginal_message
, unless master_id is not set, in which case it is set from self.master_id.
- send_ack(socket, original_message)
Send an acknowledgement message, which is mostly just to respect REQ/REP recv/send patterns.
- send_nak(socket, original_message)
Send a negative acknowledgement message.
- send_inproc_message(message, payload=None, flags=0)
- signal_shutdown()
- shutdown_handler(signal=None, frame=None)
- install_signal_handlers(signals=None)
- install_sigint_handler()
- startup()
- shutdown()
- join()
- class westpa.work_managers.zeromq.node.Message(message=None, payload=None, master_id=None, src_id=None)
Bases:
object
- SHUTDOWN = 'shutdown'
- ACK = 'ok'
- NAK = 'no'
- IDENTIFY = 'identify'
- TASKS_AVAILABLE = 'tasks_available'
- TASK_REQUEST = 'task_request'
- MASTER_BEACON = 'master_alive'
- RECONFIGURE_TIMEOUT = 'reconfigure_timeout'
- TASK = 'task'
- RESULT = 'result'
- idempotent_announcement_messages = {'master_alive', 'shutdown', 'tasks_available'}
- classmethod coalesce_announcements(messages)
- class westpa.work_managers.zeromq.node.PassiveMultiTimer
Bases:
object
- add_timer(identifier, duration)
- remove_timer(identifier)
- change_duration(identifier, duration)
- reset(identifier=None, at=None)
- expired(identifier, at=None)
- next_expiration()
- next_expiration_in()
- which_expired(at=None)
- class westpa.work_managers.zeromq.node.IsNode(n_local_workers=None)
Bases:
object
- write_host_info(filename=None)
- startup()
- shutdown()
- class westpa.work_managers.zeromq.node.ThreadProxy(in_type, out_type, mon_type=SocketType.PUB)
Bases:
ProxyBase
,ThreadDevice
Proxy in a Thread. See Proxy for more.
westpa.work_managers.zeromq.work_manager module
- class westpa.work_managers.zeromq.work_manager.ZMQCore
Bases:
object
- PROTOCOL_MAJOR = 3
- PROTOCOL_MINOR = 0
- PROTOCOL_UPDATE = 0
- PROTOCOL_VERSION = (3, 0, 0)
- internal_transport = 'ipc'
- default_comm_mode = 'ipc'
- default_master_heartbeat = 20.0
- default_worker_heartbeat = 20.0
- default_timeout_factor = 5.0
- default_startup_timeout = 120.0
- default_shutdown_timeout = 5.0
- classmethod make_ipc_endpoint()
- classmethod remove_ipc_endpoints()
- classmethod make_tcp_endpoint(address='127.0.0.1')
- classmethod make_internal_endpoint()
- get_identification()
- validate_message(message)
Validate incoming message. Raises an exception if the message is improperly formatted (TypeError) or does not correspond to the appropriate master (ZMQWMEnvironmentError).
- message_validation(msg)
A context manager for message validation. The instance variable
validation_fail_action
controls the behavior of this context manager:‘raise’: re-raise the exception that indicated failed validation. Useful for development.
‘exit’ (default): report the error and exit the program.
‘warn’: report the error and continue.
- recv_message(socket, flags=0, validate=True, timeout=None)
Receive a message object from the given socket, using the given flags. Message validation is performed if
validate
is true. Iftimeout
is given, then it is the number of milliseconds to wait prior to raising a ZMQWMTimeout exception.timeout
is ignored ifflags
includeszmq.NOBLOCK
.
- recv_all(socket, flags=0, validate=True)
Receive all messages currently available from the given socket.
- recv_ack(socket, flags=0, validate=True, timeout=None)
- send_message(socket, message, payload=None, flags=0)
Send a message object. Subclasses may override this to decorate the message with appropriate IDs, then delegate upward to actually send the message.
message
may either be a pre-constructedMessage
object or a message identifier, in which (latter) casepayload
will become the message payload.payload
is ignored ifmessage
is aMessage
object.
- send_reply(socket, original_message, reply='ok', payload=None, flags=0)
Send a reply to
original_message
onsocket
. The reply message is a Message object or a message identifier. The reply master_id and worker_id are set fromoriginal_message
, unless master_id is not set, in which case it is set from self.master_id.
- send_ack(socket, original_message)
Send an acknowledgement message, which is mostly just to respect REQ/REP recv/send patterns.
- send_nak(socket, original_message)
Send a negative acknowledgement message.
- send_inproc_message(message, payload=None, flags=0)
- signal_shutdown()
- shutdown_handler(signal=None, frame=None)
- install_signal_handlers(signals=None)
- install_sigint_handler()
- startup()
- shutdown()
- join()
- class westpa.work_managers.zeromq.work_manager.Message(message=None, payload=None, master_id=None, src_id=None)
Bases:
object
- SHUTDOWN = 'shutdown'
- ACK = 'ok'
- NAK = 'no'
- IDENTIFY = 'identify'
- TASKS_AVAILABLE = 'tasks_available'
- TASK_REQUEST = 'task_request'
- MASTER_BEACON = 'master_alive'
- RECONFIGURE_TIMEOUT = 'reconfigure_timeout'
- TASK = 'task'
- RESULT = 'result'
- idempotent_announcement_messages = {'master_alive', 'shutdown', 'tasks_available'}
- classmethod coalesce_announcements(messages)
- class westpa.work_managers.zeromq.work_manager.Task(fn, args, kwargs, task_id=None)
Bases:
object
- execute()
Run this task, returning a Result object.
- class westpa.work_managers.zeromq.work_manager.Result(task_id, result=None, exception=None, traceback=None)
Bases:
object
- exception westpa.work_managers.zeromq.work_manager.ZMQWorkerMissing
Bases:
ZMQWMError
Exception representing that a worker processing a task died or disappeared
- exception westpa.work_managers.zeromq.work_manager.ZMQWMEnvironmentError
Bases:
ZMQWMError
Class representing an error in the environment in which the ZeroMQ work manager is running. This includes such things as master/worker ID mismatches.
- class westpa.work_managers.zeromq.work_manager.IsNode(n_local_workers=None)
Bases:
object
- write_host_info(filename=None)
- startup()
- shutdown()
- class westpa.work_managers.zeromq.work_manager.PassiveMultiTimer
Bases:
object
- add_timer(identifier, duration)
- remove_timer(identifier)
- change_duration(identifier, duration)
- reset(identifier=None, at=None)
- expired(identifier, at=None)
- next_expiration()
- next_expiration_in()
- which_expired(at=None)
- westpa.work_managers.zeromq.work_manager.randport(address='127.0.0.1')
Select a random unused TCP port number on the given address.
- class westpa.work_managers.zeromq.work_manager.ZMQWorker(rr_endpoint, ann_endpoint)
Bases:
ZMQCore
This is the outward facing worker component of the ZMQ work manager. This forms the interface to the master. This process cannot hang or crash due to an error in tasks it executes, so tasks are isolated in ZMQExecutor, which communicates with ZMQWorker via (what else?) ZeroMQ.
- property is_master
- update_master_info(msg)
- identify(rr_socket)
- request_task(rr_socket, task_socket)
- handle_reconfigure_timeout(msg, timers)
- handle_result(result_socket, rr_socket)
- comm_loop()
Master communication loop for the worker process.
- shutdown_executor()
- install_signal_handlers(signals=None)
- startup(process_index=None)
- class westpa.work_managers.zeromq.work_manager.ZMQNode(upstream_rr_endpoint, upstream_ann_endpoint, n_local_workers=None)
-
- run()
- property is_master
- comm_loop()
- startup()
- class westpa.work_managers.zeromq.work_manager.WorkManager
Bases:
object
Base class for all work managers. At a minimum, work managers must provide a
submit()
function and an_workers
attribute (which may be a property), though most will also overridestartup()
andshutdown()
.- classmethod from_environ(wmenv=None)
- classmethod add_wm_args(parser, wmenv=None)
- sigint_handler(signum, frame)
- install_sigint_handler()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
- run()
Run the worker loop (in clients only).
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- as_completed(futures)
Return a generator which yields results from the given
futures
as they become available.
- submit_as_completed(task_generator, queue_size=None)
Return a generator which yields results from a set of
futures
as they become available. Futures are generated by thetask_generator
, which must return a triple of the form expected bysubmit
. The method also accepts an intqueue_size
that dictates the maximum number of Futures that should be pending at any given time. The default value ofNone
submits all of the tasks at once.
- wait_any(futures)
Wait on any of the given
futures
and return the first one which has a result available. If more than one result is or becomes available simultaneously, any completed future may be returned.
- wait_all(futures)
A convenience function which waits on all the given
futures
in order. This function returns the samefutures
as submitted to the function as a list, indicating the order in which waits occurred.
- property is_master
True if this is the master process for task distribution. This is necessary, e.g., for MPI, where all processes start identically and then must branch depending on rank.
- class westpa.work_managers.zeromq.work_manager.WMFuture(task_id=None)
Bases:
object
A “future”, representing work which has been dispatched for completion asynchronously.
- static all_acquired(futures)
Context manager to acquire all locks on the given
futures
. Primarily for internal use.
- get_result(discard=True)
Get the result associated with this future, blocking until it is available. If
discard
is true, then removes the reference to the result contained in this instance, so that a collection of futures need not turn into a cache of all associated results.
- property result
- wait()
Wait until this future has a result or exception available.
- get_exception()
Get the exception associated with this future, blocking until it is available.
- property exception
Get the exception associated with this future, blocking until it is available.
- get_traceback()
Get the traceback object associated with this future, if any.
- property traceback
Get the traceback object associated with this future, if any.
- is_done()
Indicates whether this future is done executing (may block if this future is being updated).
- property done
Indicates whether this future is done executing (may block if this future is being updated).
- class westpa.work_managers.zeromq.work_manager.deque
Bases:
object
deque([iterable[, maxlen]]) –> deque object
A list-like sequence optimized for data accesses near its endpoints.
- append()
Add an element to the right side of the deque.
- appendleft()
Add an element to the left side of the deque.
- clear()
Remove all elements from the deque.
- copy()
Return a shallow copy of a deque.
- count()
D.count(value) – return number of occurrences of value
- extend()
Extend the right side of the deque with elements from the iterable
- extendleft()
Extend the left side of the deque with elements from the iterable
- index()
D.index(value, [start, [stop]]) – return first index of value. Raises ValueError if the value is not present.
- insert()
D.insert(index, object) – insert object before index
- maxlen
maximum size of a deque or None if unbounded
- pop()
Remove and return the rightmost element.
- popleft()
Remove and return the leftmost element.
- remove()
D.remove(value) – remove first occurrence of value.
- reverse()
D.reverse() – reverse IN PLACE
- rotate()
Rotate the deque n steps to the right (default n=1). If n is negative, rotates left.
- class westpa.work_managers.zeromq.work_manager.ZMQWorkManager(n_local_workers=1)
Bases:
ZMQCore
,WorkManager
,IsNode
- classmethod add_wm_args(parser, wmenv=None)
- classmethod from_environ(wmenv=None)
- classmethod read_host_info(filename)
- classmethod canonicalize_endpoint(endpoint, allow_wildcard_host=True)
- property n_workers
- submit(fn, args=None, kwargs=None)
Submit a task to the work manager, returning a WMFuture object representing the pending result.
fn(*args,**kwargs)
will be executed by a worker, and the return value assigned as the result of the returned future. The functionfn
and all arguments must be picklable; note particularly that off-path modules (like the system module and any active plugins) are not picklable unless pre-loaded in the worker process (i.e. prior to forking the master).
- submit_many(tasks)
Submit a set of tasks to the work manager, returning a list of WMFuture objects representing pending results. Each entry in
tasks
should be a triple (fn, args, kwargs), which will result in fn(*args, **kwargs) being executed by a worker. The functionfn
and all arguments must be picklable; note particularly that off-path modules are not picklable unless pre-loaded in the worker process.
- send_message(socket, message, payload=None, flags=0)
Send a message object. Subclasses may override this to decorate the message with appropriate IDs, then delegate upward to actually send the message.
message
may either be a pre-constructedMessage
object or a message identifier, in which (latter) casepayload
will become the message payload.payload
is ignored ifmessage
is aMessage
object.
- handle_result(socket, msg)
- handle_task_request(socket, msg)
- update_worker_information(msg)
- check_workers()
- remove_worker(worker_id)
- shutdown_clear_tasks()
Abort pending tasks with error on shutdown.
- comm_loop()
- startup()
Perform any necessary startup work, such as spawning clients.
- shutdown()
Cleanly shut down any active workers.
westpa.work_managers.zeromq.worker module
Created on May 29, 2015
@author: mzwier
- class westpa.work_managers.zeromq.worker.ZMQCore
Bases:
object
- PROTOCOL_MAJOR = 3
- PROTOCOL_MINOR = 0
- PROTOCOL_UPDATE = 0
- PROTOCOL_VERSION = (3, 0, 0)
- internal_transport = 'ipc'
- default_comm_mode = 'ipc'
- default_master_heartbeat = 20.0
- default_worker_heartbeat = 20.0
- default_timeout_factor = 5.0
- default_startup_timeout = 120.0
- default_shutdown_timeout = 5.0
- classmethod make_ipc_endpoint()
- classmethod remove_ipc_endpoints()
- classmethod make_tcp_endpoint(address='127.0.0.1')
- classmethod make_internal_endpoint()
- get_identification()
- validate_message(message)
Validate incoming message. Raises an exception if the message is improperly formatted (TypeError) or does not correspond to the appropriate master (ZMQWMEnvironmentError).
- message_validation(msg)
A context manager for message validation. The instance variable
validation_fail_action
controls the behavior of this context manager:‘raise’: re-raise the exception that indicated failed validation. Useful for development.
‘exit’ (default): report the error and exit the program.
‘warn’: report the error and continue.
- recv_message(socket, flags=0, validate=True, timeout=None)
Receive a message object from the given socket, using the given flags. Message validation is performed if
validate
is true. Iftimeout
is given, then it is the number of milliseconds to wait prior to raising a ZMQWMTimeout exception.timeout
is ignored ifflags
includeszmq.NOBLOCK
.
- recv_all(socket, flags=0, validate=True)
Receive all messages currently available from the given socket.
- recv_ack(socket, flags=0, validate=True, timeout=None)
- send_message(socket, message, payload=None, flags=0)
Send a message object. Subclasses may override this to decorate the message with appropriate IDs, then delegate upward to actually send the message.
message
may either be a pre-constructedMessage
object or a message identifier, in which (latter) casepayload
will become the message payload.payload
is ignored ifmessage
is aMessage
object.
- send_reply(socket, original_message, reply='ok', payload=None, flags=0)
Send a reply to
original_message
onsocket
. The reply message is a Message object or a message identifier. The reply master_id and worker_id are set fromoriginal_message
, unless master_id is not set, in which case it is set from self.master_id.
- send_ack(socket, original_message)
Send an acknowledgement message, which is mostly just to respect REQ/REP recv/send patterns.
- send_nak(socket, original_message)
Send a negative acknowledgement message.
- send_inproc_message(message, payload=None, flags=0)
- signal_shutdown()
- shutdown_handler(signal=None, frame=None)
- install_signal_handlers(signals=None)
- install_sigint_handler()
- startup()
- shutdown()
- join()
- class westpa.work_managers.zeromq.worker.Message(message=None, payload=None, master_id=None, src_id=None)
Bases:
object
- SHUTDOWN = 'shutdown'
- ACK = 'ok'
- NAK = 'no'
- IDENTIFY = 'identify'
- TASKS_AVAILABLE = 'tasks_available'
- TASK_REQUEST = 'task_request'
- MASTER_BEACON = 'master_alive'
- RECONFIGURE_TIMEOUT = 'reconfigure_timeout'
- TASK = 'task'
- RESULT = 'result'
- idempotent_announcement_messages = {'master_alive', 'shutdown', 'tasks_available'}
- classmethod coalesce_announcements(messages)
- exception westpa.work_managers.zeromq.worker.ZMQWMTimeout
Bases:
ZMQWMEnvironmentError
A timeout of a sort that indicatess that a master or worker has failed or never started.
- class westpa.work_managers.zeromq.worker.PassiveMultiTimer
Bases:
object
- add_timer(identifier, duration)
- remove_timer(identifier)
- change_duration(identifier, duration)
- reset(identifier=None, at=None)
- expired(identifier, at=None)
- next_expiration()
- next_expiration_in()
- which_expired(at=None)
- class westpa.work_managers.zeromq.worker.Task(fn, args, kwargs, task_id=None)
Bases:
object
- execute()
Run this task, returning a Result object.
- class westpa.work_managers.zeromq.worker.Result(task_id, result=None, exception=None, traceback=None)
Bases:
object
- class westpa.work_managers.zeromq.worker.ZMQWorker(rr_endpoint, ann_endpoint)
Bases:
ZMQCore
This is the outward facing worker component of the ZMQ work manager. This forms the interface to the master. This process cannot hang or crash due to an error in tasks it executes, so tasks are isolated in ZMQExecutor, which communicates with ZMQWorker via (what else?) ZeroMQ.
- property is_master
- update_master_info(msg)
- identify(rr_socket)
- request_task(rr_socket, task_socket)
- handle_reconfigure_timeout(msg, timers)
- handle_result(result_socket, rr_socket)
- comm_loop()
Master communication loop for the worker process.
- shutdown_executor()
- install_signal_handlers(signals=None)
- startup(process_index=None)
westpa.tools package
westpa.tools module
tools – classes for implementing command-line tools for WESTPA
- class westpa.tools.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- class westpa.tools.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.tools.WESTToolComponent
Bases:
object
Base class for WEST command line tools and components used in constructing tools
- include_arg(argname)
- exclude_arg(argname)
- set_arg_default(argname, value)
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- add_all_args(parser)
Add arguments for all components from which this class derives to the given parser, starting with the class highest up the inheritance chain (most distant ancestor).
- process_all_args(args)
- class westpa.tools.WESTSubcommand(parent)
Bases:
WESTToolComponent
Base class for command-line tool subcommands. A little sugar for making this more uniform.
- subcommand = None
- help_text = None
- description = None
- add_to_subparsers(subparsers)
- go()
- property work_manager
The work manager for this tool. Raises AttributeError if this is not a parallel tool.
- class westpa.tools.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- class westpa.tools.WESTMultiTool(wm_env=None)
Bases:
WESTParallelTool
Base class for command-line tools which work with multiple simulations. Automatically parses for and gives commands to load multiple files.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- parse_from_yaml(yamlfilepath)
Parse options from YAML input file. Command line arguments take precedence over options specified in the YAML hierarchy. TODO: add description on how YAML files should be constructed.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- exception NoSimulationsException
Bases:
Exception
- generate_file_list(key_list)
A convenience function which takes in a list of keys that are filenames, and returns a dictionary which contains all the individual files loaded inside of a dictionary keyed to the filename.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.tools.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.tools.WESTDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
Tool for synthesizing a dataset for analysis from other datasets. This may be done using a custom function, or a list of “data set specifications”. It is anticipated that if several source datasets are required, then a tool will have multiple instances of this class.
- group_name = 'input dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.tools.WESTWDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
- group_name = 'weight dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.tools.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- class westpa.tools.SegSelector
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- parse_segsel_file(filename)
- class westpa.tools.BinMappingComponent
Bases:
WESTToolComponent
Component for obtaining a bin mapper from one of several places based on command-line arguments. Such locations include an HDF5 file that contains pickled mappers (including the primary WEST HDF5 file), the system object, an external function, or (in the common case of rectilinear bins) a list of lists of bin boundaries.
Some configuration is necessary prior to calling process_args() if loading a mapper from HDF5. Specifically, either set_we_h5file_info() or set_other_h5file_info() must be called to describe where to find the appropriate mapper. In the case of set_we_h5file_info(), the mapper used for WE at the end of a given iteration will be loaded. In the case of set_other_h5file_info(), an arbitrary group and hash value are specified; the mapper corresponding to that hash in the given group will be returned.
In the absence of arguments, the mapper contained in an existing HDF5 file is preferred; if that is not available, the mapper from the system driver is used.
This component adds the following arguments to argument parsers:
- --bins-from-system
Obtain bins from the system driver
—bins-from-expr=EXPR Construct rectilinear bins by parsing EXPR and calling RectilinearBinMapper() with the result. EXPR must therefore be a list of lists.
- –bins-from-function=[PATH:]MODULE.FUNC
Call an external function FUNC in module MODULE (optionally adding PATH to the search path when loading MODULE) which, when called, returns a fully-constructed bin mapper.
—bins-from-file Load bin definitions from a YAML configuration file.
- --bins-from-h5file
Load bins from the file being considered; this is intended to mean the master WEST HDF5 file or results of other binning calculations, as appropriate.
- add_args(parser, description='binning options', suppress=[])
Add arguments specific to this component to the given argparse parser.
- add_target_count_args(parser, description='bin target count options')
Add options to the given parser corresponding to target counts.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- set_we_h5file_info(n_iter=None, data_manager=None, required=False)
Set up to load a bin mapper from the master WEST HDF5 file. The mapper is actually loaded from the file when self.load_bin_mapper() is called, if and only if command line arguments direct this. If
required
is true, then a mapper must be available at iterationn_iter
, or else an exception will be raised.
- set_other_h5file_info(topology_group, hashval)
Set up to load a bin mapper from (any) open HDF5 file, where bin topologies are stored in
topology_group
(an h5py Group object) and the desired mapper has hash valuehashval
. The mapper itself is loaded when self.load_bin_mapper() is called.
- westpa.tools.mapper_from_dict(ybins)
- class westpa.tools.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.tools.Plotter(h5file, h5key, iteration=-1, interface='matplotlib')
Bases:
object
This is a semi-generic plotting interface that has a built in curses based terminal plotter. It’s fairly specific to what we’re using it for here, but we could (and maybe should) build it out into a little library that we can use via the command line to plot things. Might be useful for looking at data later. That would also cut the size of this tool down by a good bit.
- plot(i=0, j=1, tau=1, iteration=None, dim=0, interface=None)
- class westpa.tools.KineticsIteration(kin_h5file, index, assign, iteration=-1)
Bases:
object
- keys()
- class westpa.tools.WIPIScheme(scheme, name, parent, settings)
Bases:
object
- property scheme
- property list_schemes
Lists what schemes are configured in west.cfg file. Schemes should be structured as follows, in west.cfg:
- west:
- system:
- analysis:
directory: analysis analysis_schemes:
- scheme.1:
enabled: True states:
label: unbound coords: [[7.0]]
label: bound coords: [[2.7]]
- bins:
type: RectilinearBinMapper boundaries: [[0.0, 2.80, 7, 10000]]
- property iteration
- property assign
- property direct
The output from w_direct.py from the current scheme.
- property state_labels
- property bin_labels
- property west
- property reweight
- property current
The current iteration. See help for __get_data_for_iteration__
- property past
The previous iteration. See help for __get_data_for_iteration__
westpa.tools.binning module
- class westpa.tools.binning.count(start=0, step=1)
Bases:
object
Return a count object whose .__next__() method returns consecutive values.
- Equivalent to:
- def count(firstval=0, step=1):
x = firstval while 1:
yield x x += step
- exception westpa.tools.binning.PickleError
Bases:
Exception
- class westpa.tools.binning.RectilinearBinMapper(boundaries)
Bases:
BinMapper
Bin into a rectangular grid based on tuples of float values
- property boundaries
- assign(coords, mask=None, output=None)
- westpa.tools.binning.weight_dtype
alias of
float64
- westpa.tools.binning.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- class westpa.tools.binning.WESTToolComponent
Bases:
object
Base class for WEST command line tools and components used in constructing tools
- include_arg(argname)
- exclude_arg(argname)
- set_arg_default(argname, value)
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- add_all_args(parser)
Add arguments for all components from which this class derives to the given parser, starting with the class highest up the inheritance chain (most distant ancestor).
- process_all_args(args)
- westpa.tools.binning.mapper_from_expr(expr)
- westpa.tools.binning.mapper_from_system()
- westpa.tools.binning.mapper_from_function(funcspec)
Return a mapper constructed by calling a function in a named module.
funcspec
should be formatted as[PATH]:MODULE.FUNC
. This function loads MODULE, optionally adding PATH to the search path, then returns MODULE.FUNC()
- westpa.tools.binning.mapper_from_hdf5(topol_group, hashval)
Retrieve the mapper identified by
hashval
from the given bin topology grouptopol_group
. Returns(mapper, pickle, hashval)
- westpa.tools.binning.mapper_from_yaml(yamlfilename)
- westpa.tools.binning.mapper_from_dict(ybins)
- westpa.tools.binning.write_bin_info(mapper, assignments, weights, n_target_states, outfile=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, detailed=False)
Write information about binning to
outfile
, given a mapper (mapper
) and the weights (weights
) and bin assignments (assignments
) of a set of segments, along with a target state count (n_target_states
). Ifdetailed
is true, then per-bin information is written as well as summary information about all bins.
- westpa.tools.binning.write_bin_labels(mapper, dest, header='# bin labels:\n', fmt='# bin {index:{max_iwidth}d} -- {label!s}\n')
Print labels for all bins in
mapper
to the file-like object``dest``.If provided,
header
is printed prior to any labels. A number of expansions are available inheader
:mapper
– the mapper itself (from which most of the following can be obtained)classname
– the class name of the mappernbins
– number of bins in the mapper
The
fmt
string specifies how bin labels are to be printed. A number of expansions are available infmt
:index
– the zero-based index of the binlabel
– the label of the binmax_iwidth
– the maximum width (in characters) of the bin index, for pretty alignment
- class westpa.tools.binning.BinMappingComponent
Bases:
WESTToolComponent
Component for obtaining a bin mapper from one of several places based on command-line arguments. Such locations include an HDF5 file that contains pickled mappers (including the primary WEST HDF5 file), the system object, an external function, or (in the common case of rectilinear bins) a list of lists of bin boundaries.
Some configuration is necessary prior to calling process_args() if loading a mapper from HDF5. Specifically, either set_we_h5file_info() or set_other_h5file_info() must be called to describe where to find the appropriate mapper. In the case of set_we_h5file_info(), the mapper used for WE at the end of a given iteration will be loaded. In the case of set_other_h5file_info(), an arbitrary group and hash value are specified; the mapper corresponding to that hash in the given group will be returned.
In the absence of arguments, the mapper contained in an existing HDF5 file is preferred; if that is not available, the mapper from the system driver is used.
This component adds the following arguments to argument parsers:
- --bins-from-system
Obtain bins from the system driver
—bins-from-expr=EXPR Construct rectilinear bins by parsing EXPR and calling RectilinearBinMapper() with the result. EXPR must therefore be a list of lists.
- –bins-from-function=[PATH:]MODULE.FUNC
Call an external function FUNC in module MODULE (optionally adding PATH to the search path when loading MODULE) which, when called, returns a fully-constructed bin mapper.
—bins-from-file Load bin definitions from a YAML configuration file.
- --bins-from-h5file
Load bins from the file being considered; this is intended to mean the master WEST HDF5 file or results of other binning calculations, as appropriate.
- add_args(parser, description='binning options', suppress=[])
Add arguments specific to this component to the given argparse parser.
- add_target_count_args(parser, description='bin target count options')
Add options to the given parser corresponding to target counts.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- set_we_h5file_info(n_iter=None, data_manager=None, required=False)
Set up to load a bin mapper from the master WEST HDF5 file. The mapper is actually loaded from the file when self.load_bin_mapper() is called, if and only if command line arguments direct this. If
required
is true, then a mapper must be available at iterationn_iter
, or else an exception will be raised.
- set_other_h5file_info(topology_group, hashval)
Set up to load a bin mapper from (any) open HDF5 file, where bin topologies are stored in
topology_group
(an h5py Group object) and the desired mapper has hash valuehashval
. The mapper itself is loaded when self.load_bin_mapper() is called.
westpa.tools.core module
Core classes for creating WESTPA command-line tools
- class westpa.tools.core.WESTToolComponent
Bases:
object
Base class for WEST command line tools and components used in constructing tools
- include_arg(argname)
- exclude_arg(argname)
- set_arg_default(argname, value)
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- add_all_args(parser)
Add arguments for all components from which this class derives to the given parser, starting with the class highest up the inheritance chain (most distant ancestor).
- process_all_args(args)
- class westpa.tools.core.WESTTool
Bases:
WESTToolComponent
Base class for WEST command line tools
- prog = None
- usage = None
- description = None
- epilog = None
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- make_parser(prog=None, usage=None, description=None, epilog=None, args=None)
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then call self.go()
- class westpa.tools.core.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.tools.core.WESTMultiTool(wm_env=None)
Bases:
WESTParallelTool
Base class for command-line tools which work with multiple simulations. Automatically parses for and gives commands to load multiple files.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- parse_from_yaml(yamlfilepath)
Parse options from YAML input file. Command line arguments take precedence over options specified in the YAML hierarchy. TODO: add description on how YAML files should be constructed.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- exception NoSimulationsException
Bases:
Exception
- generate_file_list(key_list)
A convenience function which takes in a list of keys that are filenames, and returns a dictionary which contains all the individual files loaded inside of a dictionary keyed to the filename.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.tools.core.WESTSubcommand(parent)
Bases:
WESTToolComponent
Base class for command-line tool subcommands. A little sugar for making this more uniform.
- subcommand = None
- help_text = None
- description = None
- add_to_subparsers(subparsers)
- go()
- property work_manager
The work manager for this tool. Raises AttributeError if this is not a parallel tool.
- class westpa.tools.core.WESTMasterCommand
Bases:
WESTTool
Base class for command-line tools that employ subcommands
- subparsers_title = None
- subcommands = None
- include_help_command = True
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
westpa.tools.data_reader module
- class westpa.tools.data_reader.WESTToolComponent
Bases:
object
Base class for WEST command line tools and components used in constructing tools
- include_arg(argname)
- exclude_arg(argname)
- set_arg_default(argname, value)
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- add_all_args(parser)
Add arguments for all components from which this class derives to the given parser, starting with the class highest up the inheritance chain (most distant ancestor).
- process_all_args(args)
- westpa.tools.data_reader.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- class westpa.tools.data_reader.FnDSSpec(h5file_or_name, fn)
Bases:
FileLinkedDSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- class westpa.tools.data_reader.MultiDSSpec(dsspecs)
Bases:
DSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- class westpa.tools.data_reader.SingleSegmentDSSpec(h5file_or_name, dsname, alias=None, slice=None)
Bases:
SingleDSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- get_segment_data(n_iter, seg_id)
- class westpa.tools.data_reader.SingleIterDSSpec(h5file_or_name, dsname, alias=None, slice=None)
Bases:
SingleDSSpec
- get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
- class westpa.tools.data_reader.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.tools.data_reader.WESTDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
Tool for synthesizing a dataset for analysis from other datasets. This may be done using a custom function, or a list of “data set specifications”. It is anticipated that if several source datasets are required, then a tool will have multiple instances of this class.
- group_name = 'input dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.tools.data_reader.WESTWDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
- group_name = 'weight dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
westpa.tools.dtypes module
Numpy/HDF5 data types shared among several WESTPA tools
- westpa.tools.dtypes.n_iter_dtype
alias of
uint32
- westpa.tools.dtypes.seg_id_dtype
alias of
int64
- westpa.tools.dtypes.weight_dtype
alias of
float64
westpa.tools.iter_range module
- class westpa.tools.iter_range.WESTToolComponent
Bases:
object
Base class for WEST command line tools and components used in constructing tools
- include_arg(argname)
- exclude_arg(argname)
- set_arg_default(argname, value)
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- add_all_args(parser)
Add arguments for all components from which this class derives to the given parser, starting with the class highest up the inheritance chain (most distant ancestor).
- process_all_args(args)
- class westpa.tools.iter_range.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
westpa.tools.kinetics_tool module
- class westpa.tools.kinetics_tool.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.tools.kinetics_tool.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- class westpa.tools.kinetics_tool.WESTSubcommand(parent)
Bases:
WESTToolComponent
Base class for command-line tool subcommands. A little sugar for making this more uniform.
- subcommand = None
- help_text = None
- description = None
- add_to_subparsers(subparsers)
- go()
- property work_manager
The work manager for this tool. Raises AttributeError if this is not a parallel tool.
- class westpa.tools.kinetics_tool.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- westpa.tools.kinetics_tool.generate_future(work_manager, name, eval_block, kwargs)
- class westpa.tools.kinetics_tool.WESTKineticsBase(parent)
Bases:
WESTSubcommand
Common argument processing for w_direct/w_reweight subcommands. Mostly limited to handling input and output from w_assign.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.tools.kinetics_tool.AverageCommands(parent)
Bases:
WESTKineticsBase
- default_output_file = 'direct.h5'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- stamp_mcbs_info(dataset)
- open_files()
- open_assignments()
- print_averages(dataset, header, dim=1)
- run_calculation(pi, nstates, start_iter, stop_iter, step_iter, dataset, eval_block, name, dim, do_averages=False, **extra)
westpa.tools.plot module
- class westpa.tools.plot.Plotter(h5file, h5key, iteration=-1, interface='matplotlib')
Bases:
object
This is a semi-generic plotting interface that has a built in curses based terminal plotter. It’s fairly specific to what we’re using it for here, but we could (and maybe should) build it out into a little library that we can use via the command line to plot things. Might be useful for looking at data later. That would also cut the size of this tool down by a good bit.
- plot(i=0, j=1, tau=1, iteration=None, dim=0, interface=None)
westpa.tools.progress module
- class westpa.tools.progress.ProgressIndicator(stream=None, interval=1)
Bases:
object
- draw_fancy()
- draw_simple()
- draw()
- clear()
- property operation
- property extent
- property progress
- new_operation(operation, extent=None, progress=0)
- start()
- stop()
- class westpa.tools.progress.WESTToolComponent
Bases:
object
Base class for WEST command line tools and components used in constructing tools
- include_arg(argname)
- exclude_arg(argname)
- set_arg_default(argname, value)
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- add_all_args(parser)
Add arguments for all components from which this class derives to the given parser, starting with the class highest up the inheritance chain (most distant ancestor).
- process_all_args(args)
- class westpa.tools.progress.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
westpa.tools.selected_segs module
- class westpa.tools.selected_segs.WESTToolComponent
Bases:
object
Base class for WEST command line tools and components used in constructing tools
- include_arg(argname)
- exclude_arg(argname)
- set_arg_default(argname, value)
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- add_all_args(parser)
Add arguments for all components from which this class derives to the given parser, starting with the class highest up the inheritance chain (most distant ancestor).
- process_all_args(args)
- westpa.tools.selected_segs.seg_id_dtype
alias of
int64
- class westpa.tools.selected_segs.SegmentSelection(iterable=None)
Bases:
object
Initialize this segment selection from an iterable of (n_iter,seg_id) pairs.
- add(pair)
- from_iter(n_iter)
- property start_iter
- property stop_iter
- classmethod from_text(filename)
- class westpa.tools.selected_segs.AllSegmentSelection(start_iter=None, stop_iter=None, data_manager=None)
Bases:
SegmentSelection
Initialize this segment selection from an iterable of (n_iter,seg_id) pairs.
- add(pair)
- from_iter(n_iter)
- class westpa.tools.selected_segs.SegSelector
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- parse_segsel_file(filename)
westpa.tools.wipi module
- class westpa.tools.wipi.Plotter(h5file, h5key, iteration=-1, interface='matplotlib')
Bases:
object
This is a semi-generic plotting interface that has a built in curses based terminal plotter. It’s fairly specific to what we’re using it for here, but we could (and maybe should) build it out into a little library that we can use via the command line to plot things. Might be useful for looking at data later. That would also cut the size of this tool down by a good bit.
- plot(i=0, j=1, tau=1, iteration=None, dim=0, interface=None)
- class westpa.tools.wipi.KineticsIteration(kin_h5file, index, assign, iteration=-1)
Bases:
object
- keys()
- class westpa.tools.wipi.WIPIScheme(scheme, name, parent, settings)
Bases:
object
- property scheme
- property list_schemes
Lists what schemes are configured in west.cfg file. Schemes should be structured as follows, in west.cfg:
- west:
- system:
- analysis:
directory: analysis analysis_schemes:
- scheme.1:
enabled: True states:
label: unbound coords: [[7.0]]
label: bound coords: [[2.7]]
- bins:
type: RectilinearBinMapper boundaries: [[0.0, 2.80, 7, 10000]]
- property iteration
- property assign
- property direct
The output from w_direct.py from the current scheme.
- property state_labels
- property bin_labels
- property west
- property reweight
- property current
The current iteration. See help for __get_data_for_iteration__
- property past
The previous iteration. See help for __get_data_for_iteration__
Other Packages
westpa.fasthist package
Module contents
- westpa.fasthist.histnd(values, binbounds, weights=1.0, out=None, binbound_check=True, ignore_out_of_range=False)
Generate an N-dimensional PDF (or contribution to a PDF) from the given values.
binbounds
is a list of arrays of boundary values, with one entry for each dimension (values
must have as many columns as there are entries inbinbounds
)weight
, if provided, specifies the weight each value contributes to the histogram; this may be a scalar (for equal weights for all values) or a vector of the same length asvalues
(for unequal weights). Ifbinbound_check
is True, then the boundaries are checked for strict positive monotonicity; set to False to shave a few microseconds if you know your bin boundaries to be monotonically increasing.
- westpa.fasthist.normhistnd(hist, binbounds)
Normalize the N-dimensional histogram
hist
with corresponding bin boundariesbinbounds
. Modifieshist
in place and returns the normalization factor used.
westpa.mclib package
Module contents
A package for performing Monte Carlo bootstrap estimates of statistics.
- westpa.mclib.mcbs_correltime(dataset, alpha, n_sets=None)
Calculate the correlation time of the given
dataset
, significant to the (1-alpha
) level, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8. An appropriate balance between space and speed is chosen based on the size of the input data.Returns 0 for data statistically uncorrelated with (1-alpha) confidence, otherwise the correlation length. (Thus, the appropriate stride for blocking is the result of this function plus one.)
- westpa.mclib.get_bssize(alpha)
Return a bootstrap data set size appropriate for the given confidence level.
- westpa.mclib.mcbs_ci(dataset, estimator, alpha, dlen, n_sets=None, args=None, kwargs=None, sort=<function msort>)
Perform a Monte Carlo bootstrap estimate for the (1-
alpha
) confidence interval on the givendataset
with the givenestimator
. This routine is not appropriate for time-correlated data.Returns
(estimate, ci_lb, ci_ub)
whereestimate
is the application of the givenestimator
to the inputdataset
, andci_lb
andci_ub
are the lower and upper limits, respectively, of the (1-alpha
) confidence interval onestimate
.estimator
is called asestimator(dataset, *args, **kwargs)
. Common estimators include:numpy.mean – calculate the confidence interval on the mean of
dataset
numpy.median – calculate a confidence interval on the median of
dataset
numpy.std – calculate a confidence interval on the standard deviation of
datset
.
n_sets
is the number of synthetic data sets to generate using the givenestimator
, which will be chosen using `get_bssize()`_ ifn_sets
is not given.sort
can be used to override the sorting routine used to calculate the confidence interval, which should only be necessary for estimators returning vectors rather than scalars.
- westpa.mclib.mcbs_ci_correl(estimator_datasets, estimator, alpha, n_sets=None, args=None, autocorrel_alpha=None, autocorrel_n_sets=None, subsample=None, do_correl=True, mcbs_enable=None, estimator_kwargs={})
Perform a Monte Carlo bootstrap estimate for the (1-
alpha
) confidence interval on the givendataset
with the givenestimator
. This routine is appropriate for time-correlated data, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8 to determine a statistically-significant correlation time and then reducing the dataset by a factor of that correlation time before running a “classic” Monte Carlo bootstrap.Returns
(estimate, ci_lb, ci_ub, correl_time)
whereestimate
is the application of the givenestimator
to the inputdataset
,ci_lb
andci_ub
are the lower and upper limits, respectively, of the (1-alpha
) confidence interval onestimate
, andcorrel_time
is the correlation time of the dataset, significant to (1-autocorrel_alpha
).estimator
is called asestimator(dataset, *args, **kwargs)
. Common estimators include:np.mean – calculate the confidence interval on the mean of
dataset
np.median – calculate a confidence interval on the median of
dataset
np.std – calculate a confidence interval on the standard deviation of
datset
.
n_sets
is the number of synthetic data sets to generate using the givenestimator
, which will be chosen using `get_bssize()`_ ifn_sets
is not given.autocorrel_alpha
(which defaults toalpha
) can be used to adjust the significance level of the autocorrelation calculation. Note that too high a significance level (too low an alpha) for evaluating the significance of autocorrelation values can result in a failure to detect correlation if the autocorrelation function is noisy.The given
subsample
function is used, if provided, to subsample the dataset prior to running the full Monte Carlo bootstrap. If none is provided, then a random entry from each correlated block is used as the value for that block. Other reasonable choices includenp.mean
,np.median
,(lambda x: x[0])
or(lambda x: x[-1])
. In particular, usingsubsample=np.mean
will converge to the block averaged mean and standard error, while accounting for any non-normality in the distribution of the mean.
westpa.trajtree package
westpa.trajtree module
westpa.trajtree.trajtree module
- class westpa.trajtree.trajtree.AllSegmentSelection(start_iter=None, stop_iter=None, data_manager=None)
Bases:
SegmentSelection
Initialize this segment selection from an iterable of (n_iter,seg_id) pairs.
- add(pair)
- from_iter(n_iter)
- class westpa.trajtree.trajtree.trajnode(n_iter, seg_id)
Bases:
tuple
Create new instance of trajnode(n_iter, seg_id)
- n_iter
Alias for field number 0
- seg_id
Alias for field number 1
- class westpa.trajtree.trajtree.TrajTreeSet(segsel=None, data_manager=None)
Bases:
_trajtree_base
- get_roots()
- get_root_indices()
- trace_trajectories(visit, get_visitor_state=None, set_visitor_state=None, vargs=None, vkwargs=None)
- class westpa.trajtree.trajtree.FakeTrajTreeSet
Bases:
TrajTreeSet
WESTPA Old Tools
westpa.oldtools package
westpa.oldtools module
westpa.oldtools.files module
- westpa.oldtools.files.load_npy_or_text(filename)
Load an array from an existing .npy file, or read a text file and convert to a NumPy array. In either case, return a NumPy array. If a pickled NumPy dataset is found, memory-map it read-only. If the specified file does not contain a pickled NumPy array, attempt to read the file using numpy.loadtxt(filename, **kwargs).
westpa.oldtools.miscfn module
Miscellaneous support functions for WEST and WEST tools
- westpa.oldtools.miscfn.parse_int_list(list_string)
Parse a simple list consisting of integers or ranges of integers separated by commas. Ranges are specified as min:max, and include the maximum value (unlike Python’s
range
). Duplicate values are ignored. Returns the result as a sorted list. Raises ValueError if the list cannot be parsed.
westpa.oldtools.aframe package
westpa.oldtools.aframe
WEST Analyis framework – an unholy mess of classes exploiting each other
- class westpa.oldtools.aframe.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- exception westpa.oldtools.aframe.ArgumentError(*args, **kwargs)
Bases:
RuntimeError
- class westpa.oldtools.aframe.WESTAnalysisTool
Bases:
object
- add_args(parser, upcall=True)
Add arguments to a parser common to all analyses of this type.
- process_args(args, upcall=True)
- open_analysis_backing()
- close_analysis_backing()
- require_analysis_group(groupname, replace=False)
- class westpa.oldtools.aframe.IterRangeMixin
Bases:
AnalysisMixin
A mixin for limiting the range of data considered for a given analysis. This should go after DataManagerMixin
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- check_iter_range()
- iter_block_iter()
Return an iterable of (block_first,block_last+1) over the blocks of iterations selected by –first/–last/–step. NOTE WELL that the second of the pair follows Python iterator conventions and returns one past the last element of the block.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first/–last/–step.
- record_data_iter_range(h5object, first_iter=None, last_iter=None)
Store attributes
first_iter
andlast_iter
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, first_iter=None, last_iter=None)
Check that the given HDF5 object contains (as denoted by its
first_iter
/last_iter
attributes) at least the data range specified.
- check_data_iter_range_equal(h5object, first_iter=None, last_iter=None)
Check that the given HDF5 object contains per-iteration data for exactly the specified iterations (as denoted by the object’s
first_iter
andlast_iter
attributes
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride. (In other words, is the given
iter_step
a multiple of the stride with which data was recorded.)
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, first_iter=None, last_iter=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(first_iter=None, last_iter=None, iter_step=None)
- class westpa.oldtools.aframe.WESTDataReaderMixin
Bases:
AnalysisMixin
A mixin for analysis requiring access to the HDF5 files generated during a WEST run.
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- clear_run_cache()
- property cache_pcoords
Whether or not to cache progress coordinate data. While caching this data can significantly speed up some analysis operations, this requires copious RAM.
Setting this to False when it was formerly True will release any cached data.
- get_summary_table()
- get_iter_group(n_iter)
Return the HDF5 group corresponding to
n_iter
- get_segments(n_iter, include_pcoords=True)
Return all segments present in iteration n_iter
- get_segments_by_id(n_iter, seg_ids, include_pcoords=True)
Get segments from the data manager, employing caching where possible
- get_children(segment, include_pcoords=True)
- get_seg_index(n_iter)
- get_wtg_parent_array(n_iter)
- get_parent_array(n_iter)
- get_pcoord_array(n_iter)
- get_pcoord_dataset(n_iter)
- get_pcoords(n_iter, seg_ids)
- get_seg_ids(n_iter, bool_array=None)
- get_created_seg_ids(n_iter)
Return a list of seg_ids corresponding to segments which were created for the given iteration (are not continuations).
- max_iter_segs_in_range(first_iter, last_iter)
Return the maximum number of segments present in any iteration in the range selected
- total_segs_in_range(first_iter, last_iter)
Return the total number of segments present in all iterations in the range selected
- get_pcoord_len(n_iter)
Get the length of the progress coordinate array for the given iteration.
- get_total_time(first_iter=None, last_iter=None, dt=None)
Return the total amount of simulation time spanned between first_iter and last_iter (inclusive).
- class westpa.oldtools.aframe.ExtDataReaderMixin
Bases:
AnalysisMixin
An external data reader, primarily designed for reading brute force data, but also suitable for any auxiliary datasets required for analysis.
- default_chunksize = 8192
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- is_npy(filename)
- load_npy_or_text(filename)
Load an array from an existing .npy file, or read a text file and convert to a NumPy array. In either case, return a NumPy array. If a pickled NumPy dataset is found, memory-map it read-only. If the specified file does not contain a pickled NumPy array, attempt to read the file using numpy.loadtxt(filename).
- text_to_h5dataset(fileobj, group, dsname, dtype=<class 'numpy.float64'>, skiprows=0, usecols=None, chunksize=None)
Read text-format data from the given filename or file-like object
fileobj
and write to a newly-created dataset calleddsname
in the HDF5 groupgroup
. The data is stored as typedtype
. By default, the shape is taken as (number of lines, number of columns); columns can be omitted by specifying a list forusecols
, and lines can be skipped by usingskiprows
. Data is read in chunks ofchunksize
rows.
- npy_to_h5dataset(array, group, dsname, usecols=None, chunksize=None)
Store the given array into a newly-created dataset named
dsname
in the HDF5 groupgroup
, optionally only storing a subset of columns. Data is writtenchunksize
rows at a time, allowing very large memory-mapped arrays to be copied.
- class westpa.oldtools.aframe.BFDataManager
Bases:
AnalysisMixin
A class to manage brute force trajectory data. The primary purpose is to read in and manage brute force progress coordinate data for one or more trajectories. The trajectories need not be the same length, but they do need to have the same time spacing for progress coordinate values.
- traj_index_dtype = dtype([('pcoord_len', '<u8'), ('source_data', 'O')])
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- update_traj_index(traj_id, pcoord_len, source_data)
- get_traj_group(traj_id)
- create_traj_group()
- get_n_trajs()
- get_traj_len(traj_id)
- get_max_traj_len()
- get_pcoord_array(traj_id)
- get_pcoord_dataset(traj_id)
- require_bf_h5file()
- close_bf_h5file()
- class westpa.oldtools.aframe.BinningMixin
Bases:
AnalysisMixin
A mixin for performing binning on WEST data.
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- mapper_from_expr(expr)
- write_bin_labels(dest, header='# bin labels:\n', format='# bin {bin_index:{max_iwidth}d} -- {label!s}\n')
Print labels for all bins in
self.mapper
todest
. If provided,header
is printed before any labels. Theformat
string specifies how bin labels are to be printed. Valid entries are:bin_index
– the zero-based index of the binlabel
– the label, as obtained bybin.label
max_iwidth
– the maximum width (in characters) of the bin index, for pretty alignment
- require_binning_group()
- delete_binning_group()
- record_data_binhash(h5object)
Record the identity hash for self.mapper as an attribute on the given HDF5 object (group or dataset)
- check_data_binhash(h5object)
Check whether the recorded bin identity hash on the given HDF5 object matches the identity hash for self.mapper
- assign_to_bins()
Assign WEST segment data to bins. Requires the DataReader mixin to be in the inheritance tree
- require_bin_assignments()
- get_bin_assignments(first_iter=None, last_iter=None)
- get_bin_populations(first_iter=None, last_iter=None)
- class westpa.oldtools.aframe.MCBSMixin
Bases:
AnalysisMixin
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- calc_mcbs_nsets(alpha=None)
- calc_ci_bound_indices(n_sets=None, alpha=None)
- class westpa.oldtools.aframe.TrajWalker(data_reader, history_chunksize=100)
Bases:
object
A class to perform analysis by walking the trajectory tree. A stack is used rather than recursion, or else the highest number of iterations capable of being considered would be the same as the Python recursion limit.
- trace_to_root(n_iter, seg_id)
Trace the given segment back to its starting point, returning a list of Segment objects describing the entire trajectory.
- get_trajectory_roots(first_iter, last_iter, include_pcoords=True)
Get segments which start new trajectories. If min_iter or max_iter is specified, restrict the set of iterations within which the search is conducted.
- get_initial_nodes(first_iter, last_iter, include_pcoords=True)
Get segments with which to begin a tree walk – those alive or created within [first_iter,last_iter].
- trace_trajectories(first_iter, last_iter, callable, include_pcoords=True, cargs=None, ckwargs=None, get_state=None, set_state=None)
- Walk the trajectory tree depth-first, calling
callable(segment, children, history, *cargs, **ckwargs)
for each segment
visited.
segment
is the segment being visited,children
is that segment’s children,history
is the chain of segments leading tosegment
(not includingsegment
). get_state and set_state are used to record and reset, respectively, any state specific tocallable
when a new branch is traversed.
- class westpa.oldtools.aframe.TransitionAnalysisMixin
Bases:
AnalysisMixin
- require_transitions_group()
- delete_transitions_group()
- get_transitions_ds()
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- require_transitions()
- find_transitions()
- class westpa.oldtools.aframe.TransitionEventAccumulator(n_bins, output_group, calc_fpts=True)
Bases:
object
- index_dtype
alias of
uint64
- count_dtype
alias of
uint64
- weight_dtype
alias of
float64
- output_tdat_chunksize = 4096
- tdat_buffersize = 524288
- max_acc = 32768
- clear()
- clear_state()
- get_state()
- set_state(state_dict)
- record_transition_data(tdat)
Update running statistics and write transition data to HDF5 (with buffering)
- flush_transition_data()
Flush any unwritten output that may be present
- start_accumulation(assignments, weights, bin_pops, traj=0, n_iter=0)
- continue_accumulation(assignments, weights, bin_pops, traj=0, n_iter=0)
- class westpa.oldtools.aframe.BFTransitionAnalysisMixin
Bases:
TransitionAnalysisMixin
- require_transitions()
- find_transitions(chunksize=65536)
- class westpa.oldtools.aframe.KineticsAnalysisMixin
Bases:
AnalysisMixin
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- parse_bin_range(range_string)
- check_bin_selection(n_bins=None)
Check to see that the bin ranges selected by the user conform to the available bins (i.e., bin indices are within the permissible range). Also assigns the complete bin range if the user has not explicitly limited the bins to be considered.
- property selected_bin_pair_iter
- class westpa.oldtools.aframe.CommonOutputMixin
Bases:
AnalysisMixin
- add_common_output_args(parser_or_group)
- process_common_output_args(args)
- class westpa.oldtools.aframe.PlottingMixin
Bases:
AnalysisMixin
- require_matplotlib()
westpa.oldtools.aframe.atool module
westpa.oldtools.aframe.base_mixin module
- exception westpa.oldtools.aframe.base_mixin.ArgumentError(*args, **kwargs)
Bases:
RuntimeError
westpa.oldtools.aframe.binning module
- class westpa.oldtools.aframe.binning.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- class westpa.oldtools.aframe.binning.BinningMixin
Bases:
AnalysisMixin
A mixin for performing binning on WEST data.
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- mapper_from_expr(expr)
- write_bin_labels(dest, header='# bin labels:\n', format='# bin {bin_index:{max_iwidth}d} -- {label!s}\n')
Print labels for all bins in
self.mapper
todest
. If provided,header
is printed before any labels. Theformat
string specifies how bin labels are to be printed. Valid entries are:bin_index
– the zero-based index of the binlabel
– the label, as obtained bybin.label
max_iwidth
– the maximum width (in characters) of the bin index, for pretty alignment
- require_binning_group()
- delete_binning_group()
- record_data_binhash(h5object)
Record the identity hash for self.mapper as an attribute on the given HDF5 object (group or dataset)
- check_data_binhash(h5object)
Check whether the recorded bin identity hash on the given HDF5 object matches the identity hash for self.mapper
- assign_to_bins()
Assign WEST segment data to bins. Requires the DataReader mixin to be in the inheritance tree
- require_bin_assignments()
- get_bin_assignments(first_iter=None, last_iter=None)
- get_bin_populations(first_iter=None, last_iter=None)
westpa.oldtools.aframe.data_reader module
- class westpa.oldtools.aframe.data_reader.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)
Bases:
object
A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)
- SEG_STATUS_UNSET = 0
- SEG_STATUS_PREPARED = 1
- SEG_STATUS_COMPLETE = 2
- SEG_STATUS_FAILED = 3
- SEG_INITPOINT_UNSET = 0
- SEG_INITPOINT_CONTINUES = 1
- SEG_INITPOINT_NEWTRAJ = 2
- SEG_ENDPOINT_UNSET = 0
- SEG_ENDPOINT_CONTINUES = 1
- SEG_ENDPOINT_MERGED = 2
- SEG_ENDPOINT_RECYCLED = 3
- statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
- initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
- endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
- status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
- initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
- endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
- static initial_pcoord(segment)
Return the initial progress coordinate point of this segment.
- static final_pcoord(segment)
Return the final progress coordinate point of this segment.
- property initpoint_type
- property initial_state_id
- property status_text
- property endpoint_type_text
- class westpa.oldtools.aframe.data_reader.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- westpa.oldtools.aframe.data_reader.parse_int_list(list_string)
Parse a simple list consisting of integers or ranges of integers separated by commas. Ranges are specified as min:max, and include the maximum value (unlike Python’s
range
). Duplicate values are ignored. Returns the result as a sorted list. Raises ValueError if the list cannot be parsed.
- class westpa.oldtools.aframe.data_reader.WESTDataReaderMixin
Bases:
AnalysisMixin
A mixin for analysis requiring access to the HDF5 files generated during a WEST run.
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- clear_run_cache()
- property cache_pcoords
Whether or not to cache progress coordinate data. While caching this data can significantly speed up some analysis operations, this requires copious RAM.
Setting this to False when it was formerly True will release any cached data.
- get_summary_table()
- get_iter_group(n_iter)
Return the HDF5 group corresponding to
n_iter
- get_segments(n_iter, include_pcoords=True)
Return all segments present in iteration n_iter
- get_segments_by_id(n_iter, seg_ids, include_pcoords=True)
Get segments from the data manager, employing caching where possible
- get_children(segment, include_pcoords=True)
- get_seg_index(n_iter)
- get_wtg_parent_array(n_iter)
- get_parent_array(n_iter)
- get_pcoord_array(n_iter)
- get_pcoord_dataset(n_iter)
- get_pcoords(n_iter, seg_ids)
- get_seg_ids(n_iter, bool_array=None)
- get_created_seg_ids(n_iter)
Return a list of seg_ids corresponding to segments which were created for the given iteration (are not continuations).
- max_iter_segs_in_range(first_iter, last_iter)
Return the maximum number of segments present in any iteration in the range selected
- total_segs_in_range(first_iter, last_iter)
Return the total number of segments present in all iterations in the range selected
- get_pcoord_len(n_iter)
Get the length of the progress coordinate array for the given iteration.
- get_total_time(first_iter=None, last_iter=None, dt=None)
Return the total amount of simulation time spanned between first_iter and last_iter (inclusive).
- class westpa.oldtools.aframe.data_reader.ExtDataReaderMixin
Bases:
AnalysisMixin
An external data reader, primarily designed for reading brute force data, but also suitable for any auxiliary datasets required for analysis.
- default_chunksize = 8192
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- is_npy(filename)
- load_npy_or_text(filename)
Load an array from an existing .npy file, or read a text file and convert to a NumPy array. In either case, return a NumPy array. If a pickled NumPy dataset is found, memory-map it read-only. If the specified file does not contain a pickled NumPy array, attempt to read the file using numpy.loadtxt(filename).
- text_to_h5dataset(fileobj, group, dsname, dtype=<class 'numpy.float64'>, skiprows=0, usecols=None, chunksize=None)
Read text-format data from the given filename or file-like object
fileobj
and write to a newly-created dataset calleddsname
in the HDF5 groupgroup
. The data is stored as typedtype
. By default, the shape is taken as (number of lines, number of columns); columns can be omitted by specifying a list forusecols
, and lines can be skipped by usingskiprows
. Data is read in chunks ofchunksize
rows.
- npy_to_h5dataset(array, group, dsname, usecols=None, chunksize=None)
Store the given array into a newly-created dataset named
dsname
in the HDF5 groupgroup
, optionally only storing a subset of columns. Data is writtenchunksize
rows at a time, allowing very large memory-mapped arrays to be copied.
- class westpa.oldtools.aframe.data_reader.BFDataManager
Bases:
AnalysisMixin
A class to manage brute force trajectory data. The primary purpose is to read in and manage brute force progress coordinate data for one or more trajectories. The trajectories need not be the same length, but they do need to have the same time spacing for progress coordinate values.
- traj_index_dtype = dtype([('pcoord_len', '<u8'), ('source_data', 'O')])
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- update_traj_index(traj_id, pcoord_len, source_data)
- get_traj_group(traj_id)
- create_traj_group()
- get_n_trajs()
- get_traj_len(traj_id)
- get_max_traj_len()
- get_pcoord_array(traj_id)
- get_pcoord_dataset(traj_id)
- require_bf_h5file()
- close_bf_h5file()
westpa.oldtools.aframe.iter_range module
- class westpa.oldtools.aframe.iter_range.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- exception westpa.oldtools.aframe.iter_range.ArgumentError(*args, **kwargs)
Bases:
RuntimeError
- class westpa.oldtools.aframe.iter_range.IterRangeMixin
Bases:
AnalysisMixin
A mixin for limiting the range of data considered for a given analysis. This should go after DataManagerMixin
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- check_iter_range()
- iter_block_iter()
Return an iterable of (block_first,block_last+1) over the blocks of iterations selected by –first/–last/–step. NOTE WELL that the second of the pair follows Python iterator conventions and returns one past the last element of the block.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first/–last/–step.
- record_data_iter_range(h5object, first_iter=None, last_iter=None)
Store attributes
first_iter
andlast_iter
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- check_data_iter_range_least(h5object, first_iter=None, last_iter=None)
Check that the given HDF5 object contains (as denoted by its
first_iter
/last_iter
attributes) at least the data range specified.
- check_data_iter_range_equal(h5object, first_iter=None, last_iter=None)
Check that the given HDF5 object contains per-iteration data for exactly the specified iterations (as denoted by the object’s
first_iter
andlast_iter
attributes
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride. (In other words, is the given
iter_step
a multiple of the stride with which data was recorded.)
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- slice_per_iter_data(dataset, first_iter=None, last_iter=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- iter_range(first_iter=None, last_iter=None, iter_step=None)
westpa.oldtools.aframe.kinetics module
- class westpa.oldtools.aframe.kinetics.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- class westpa.oldtools.aframe.kinetics.KineticsAnalysisMixin
Bases:
AnalysisMixin
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- parse_bin_range(range_string)
- check_bin_selection(n_bins=None)
Check to see that the bin ranges selected by the user conform to the available bins (i.e., bin indices are within the permissible range). Also assigns the complete bin range if the user has not explicitly limited the bins to be considered.
- property selected_bin_pair_iter
westpa.oldtools.aframe.mcbs module
Tools for Monte Carlo bootstrap error analysis
- class westpa.oldtools.aframe.mcbs.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- class westpa.oldtools.aframe.mcbs.MCBSMixin
Bases:
AnalysisMixin
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- calc_mcbs_nsets(alpha=None)
- calc_ci_bound_indices(n_sets=None, alpha=None)
- westpa.oldtools.aframe.mcbs.calc_mcbs_nsets(alpha)
Return a bootstrap data set size appropriate for the given confidence level.
- westpa.oldtools.aframe.mcbs.calc_ci_bound_indices(n_sets, alpha)
- westpa.oldtools.aframe.mcbs.bootstrap_ci_ll(estimator, data, alpha, n_sets, storage, sort, eargs=(), ekwargs={}, fhat=None)
Low-level routine for calculating bootstrap error estimates. Arguments and return values are as those for
bootstrap_ci
, except that no argument is optional except additional arguments for the estimator (eargs
,ekwargs
).data
must be an array (or subclass), and an additional arraystorage
must be provided, which must be appropriately shaped and typed to holdn_sets
results fromestimator
. Further, if the valuefhat
of the estimator must be pre-calculated to allocatestorage
, then its value may be passed; otherwise,estimator(data,*eargs,**kwargs)
will be called to calculate it.
- westpa.oldtools.aframe.mcbs.bootstrap_ci(estimator, data, alpha, n_sets=None, sort=<function msort>, eargs=(), ekwargs={})
Perform a Monte Carlo bootstrap of a (1-alpha) confidence interval for the given
estimator
. Returns (fhat, ci_lower, ci_upper), where fhat is the result ofestimator(data, *eargs, **ekwargs)
, andci_lower
andci_upper
are the lower and upper bounds of the surrounding confidence interval, calculated by callingestimator(syndata, *eargs, **ekwargs)
on each synthetic data setsyndata
. Ifn_sets
is provided, that is the number of synthetic data sets generated, otherwise an appropriate size is selected automatically (seecalc_mcbs_nsets()
).sort
, if given, is applied to sort the results of callingestimator
on each synthetic data set prior to obtaining the confidence interval. This function must sort on the last index.Individual entries in synthetic data sets are selected by the first index of
data
, allowing this function to be used on arrays of multidimensional data.Returns (fhat, lb, ub, ub-lb, abs((ub-lb)/fhat), and max(ub-fhat,fhat-lb)) (that is, the estimated value, the lower and upper bounds of the confidence interval, the width of the confidence interval, the relative width of the confidence interval, and the symmetrized error bar of the confidence interval).
westpa.oldtools.aframe.output module
- class westpa.oldtools.aframe.output.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- class westpa.oldtools.aframe.output.CommonOutputMixin
Bases:
AnalysisMixin
- add_common_output_args(parser_or_group)
- process_common_output_args(args)
westpa.oldtools.aframe.plotting module
- class westpa.oldtools.aframe.plotting.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- class westpa.oldtools.aframe.plotting.PlottingMixin
Bases:
AnalysisMixin
- require_matplotlib()
westpa.oldtools.aframe.trajwalker module
- class westpa.oldtools.aframe.trajwalker.TrajWalker(data_reader, history_chunksize=100)
Bases:
object
A class to perform analysis by walking the trajectory tree. A stack is used rather than recursion, or else the highest number of iterations capable of being considered would be the same as the Python recursion limit.
- trace_to_root(n_iter, seg_id)
Trace the given segment back to its starting point, returning a list of Segment objects describing the entire trajectory.
- get_trajectory_roots(first_iter, last_iter, include_pcoords=True)
Get segments which start new trajectories. If min_iter or max_iter is specified, restrict the set of iterations within which the search is conducted.
- get_initial_nodes(first_iter, last_iter, include_pcoords=True)
Get segments with which to begin a tree walk – those alive or created within [first_iter,last_iter].
- trace_trajectories(first_iter, last_iter, callable, include_pcoords=True, cargs=None, ckwargs=None, get_state=None, set_state=None)
- Walk the trajectory tree depth-first, calling
callable(segment, children, history, *cargs, **ckwargs)
for each segment
visited.
segment
is the segment being visited,children
is that segment’s children,history
is the chain of segments leading tosegment
(not includingsegment
). get_state and set_state are used to record and reset, respectively, any state specific tocallable
when a new branch is traversed.
westpa.oldtools.aframe.transitions module
- class westpa.oldtools.aframe.transitions.AnalysisMixin
Bases:
object
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- class westpa.oldtools.aframe.transitions.TrajWalker(data_reader, history_chunksize=100)
Bases:
object
A class to perform analysis by walking the trajectory tree. A stack is used rather than recursion, or else the highest number of iterations capable of being considered would be the same as the Python recursion limit.
- trace_to_root(n_iter, seg_id)
Trace the given segment back to its starting point, returning a list of Segment objects describing the entire trajectory.
- get_trajectory_roots(first_iter, last_iter, include_pcoords=True)
Get segments which start new trajectories. If min_iter or max_iter is specified, restrict the set of iterations within which the search is conducted.
- get_initial_nodes(first_iter, last_iter, include_pcoords=True)
Get segments with which to begin a tree walk – those alive or created within [first_iter,last_iter].
- trace_trajectories(first_iter, last_iter, callable, include_pcoords=True, cargs=None, ckwargs=None, get_state=None, set_state=None)
- Walk the trajectory tree depth-first, calling
callable(segment, children, history, *cargs, **ckwargs)
for each segment
visited.
segment
is the segment being visited,children
is that segment’s children,history
is the chain of segments leading tosegment
(not includingsegment
). get_state and set_state are used to record and reset, respectively, any state specific tocallable
when a new branch is traversed.
- class westpa.oldtools.aframe.transitions.TransitionEventAccumulator(n_bins, output_group, calc_fpts=True)
Bases:
object
- index_dtype
alias of
uint64
- count_dtype
alias of
uint64
- weight_dtype
alias of
float64
- output_tdat_chunksize = 4096
- tdat_buffersize = 524288
- max_acc = 32768
- clear()
- clear_state()
- get_state()
- set_state(state_dict)
- record_transition_data(tdat)
Update running statistics and write transition data to HDF5 (with buffering)
- flush_transition_data()
Flush any unwritten output that may be present
- start_accumulation(assignments, weights, bin_pops, traj=0, n_iter=0)
- continue_accumulation(assignments, weights, bin_pops, traj=0, n_iter=0)
- class westpa.oldtools.aframe.transitions.TransitionAnalysisMixin
Bases:
AnalysisMixin
- require_transitions_group()
- delete_transitions_group()
- get_transitions_ds()
- add_args(parser, upcall=True)
- process_args(args, upcall=True)
- require_transitions()
- find_transitions()
- class westpa.oldtools.aframe.transitions.BFTransitionAnalysisMixin
Bases:
TransitionAnalysisMixin
- require_transitions()
- find_transitions(chunksize=65536)
westpa.oldtools.cmds package
westpa.oldtools.cmds module
westpa.oldtools.cmds.w_ttimes module
westpa.oldtools.stats package
westpa.oldtools.stats module
westpa.oldtools.stats.accumulator module
westpa.oldtools.stats.edfs module
- class westpa.oldtools.stats.edfs.EDF(values, weights=None)
Bases:
object
A class for creating and manipulating empirical distribution functions (cumulative distribution functions derived from sample data).
Construct a new EDF from the given values and (optionally) weights.
- static from_array(array)
- static from_arrays(x, F)
- as_array()
Return this EDF as a (N,2) array, where N is the number of unique values passed to the constructor. Numpy type casting rules are applied (so, for instance, integral abcissae are converted to floating-point values).
- quantiles(p)
Treating the EDF as a quantile function, return the values of the (statistical) variable whose probabilities are at least p. That is, Q(p) = inf {x: p <= F(x) }.
- quantile(p)
- median()
- moment(n)
Calculate the nth moment of this probability distribution
<x^n> = int_{-inf}^{inf} x^n dF(x)
- cmoment(n)
Calculate the nth central moment of this probability distribution
- mean()
- var()
Return the second central moment of this probability distribution.
- std()
Return the standard deviation (root of the variance) of this probability distribution.
westpa.oldtools.stats.mcbs module
Tools for Monte Carlo bootstrap error analysis
- westpa.oldtools.stats.mcbs.add_mcbs_options(parser)
Add arguments concerning Monte Carlo bootstrap (
confidence
andbssize
) to the given parser
- westpa.oldtools.stats.mcbs.get_bssize(alpha)
Return a bootstrap data set size appropriate for the given confidence level
- westpa.oldtools.stats.mcbs.bootstrap_ci(estimator, data, alpha, n_sets=None, args=(), kwargs={}, sort=<function msort>, extended_output=False)
Perform a Monte Carlo bootstrap of a (1-alpha) confidence interval for the given
estimator
. Returns (fhat, ci_lower, ci_upper), where fhat is the result ofestimator(data, *args, **kwargs)
, andci_lower
andci_upper
are the lower and upper bounds of the surrounding confidence interval, calculated by callingestimator(syndata, *args, **kwargs)
on each synthetic data setsyndata
. Ifn_sets
is provided, that is the number of synthetic data sets generated, otherwise an appropriate size is selected automatically (seeget_bssize()
).sort
, if given, is applied to sort the results of callingestimator
on each synthetic data set prior to obtaining the confidence interval.Individual entries in synthetic data sets are selected by the first index of
data
, allowing this function to be used on arrays of multidimensional data.If
extended_output
is True (by default not), instead of returning (fhat, lb, ub), this function returns (fhat, lb, ub, ub-lb, abs((ub-lb)/fhat), and max(ub-fhat,fhat-lb)) (that is, the estimated value, the lower and upper bounds of the confidence interval, the width of the confidence interval, the relative width of the confidence interval, and the symmetrized error bar of the confidence interval).
westpa.westext package
Currently Supported
westpa.westext.adaptvoronoi package
Submodules
westpa.westext.adaptvoronoi.adaptVor_driver module
- westpa.westext.adaptvoronoi.adaptVor_driver.check_bool(value, action='warn')
Check that the given
value
is boolean in type. If not, either raise a warning (ifaction=='warn'
) or an exception (action=='raise'
).
- exception westpa.westext.adaptvoronoi.adaptVor_driver.ConfigItemMissing(key, message=None)
Bases:
KeyError
- class westpa.westext.adaptvoronoi.adaptVor_driver.VoronoiBinMapper(dfunc, centers, dfargs=None, dfkwargs=None)
Bases:
BinMapper
A one-dimensional mapper which assigns a multidimensional pcoord to the closest center based on a distance metric. Both the list of centers and the distance function must be supplied.
- assign(coords, mask=None, output=None)
- class westpa.westext.adaptvoronoi.adaptVor_driver.AdaptiveVoronoiDriver(sim_manager, plugin_config)
Bases:
object
This plugin implements an adaptive scheme using voronoi bins from Zhang 2010, J Chem Phys, 132. The options exposed to the configuration file are:
av_enabled (bool, default False): Enables adaptive binning
max_centers (int, default 10): The maximum number of voronoi centers to be placed
walk_count (integer, default 5): Number of walkers per voronoi center
center_freq (ingeter, default 1): Frequency of center placement
priority (integer, default 1): Priority in the plugin order
- dfunc_method (function, non-optional, no default): Non-optional user defined
function that will be used to calculate distances between voronoi centers and data points
- mapper_func (function, optional): Optional user defined function for building bin
mappers for more complicated binning schemes e.g. embedding the voronoi binning in a portion of the state space. If not defined the plugin will build a VoronoiBinMapper with the information it has.
- dfunc()
Distance function to be used by the plugin. This function will be used to calculate the distance between each point.
- get_dfunc_method(plugin_config)
- get_mapper_func(plugin_config)
- get_initial_centers()
This function pulls from the centers from either the previous bin mapper or uses the definition from the system to calculate the number of centers
- update_bin_mapper()
Update the bin_mapper using the current set of voronoi centers
- update_centers(iter_group)
Update the set of Voronoi centers according to Zhang 2010, J Chem Phys, 132. A short description of the algorithm can be found in the text:
1) First reference structure is chosen randomly from the first set of given structure 2) Given a set of n reference structures, for each configuration in the iteration the distances to each reference structure is calculated and the minimum distance is found 3) The configuration with the minimum distance is selected as the next reference
- prepare_new_iteration()
Module contents
- class westpa.westext.adaptvoronoi.AdaptiveVoronoiDriver(sim_manager, plugin_config)
Bases:
object
This plugin implements an adaptive scheme using voronoi bins from Zhang 2010, J Chem Phys, 132. The options exposed to the configuration file are:
av_enabled (bool, default False): Enables adaptive binning
max_centers (int, default 10): The maximum number of voronoi centers to be placed
walk_count (integer, default 5): Number of walkers per voronoi center
center_freq (ingeter, default 1): Frequency of center placement
priority (integer, default 1): Priority in the plugin order
- dfunc_method (function, non-optional, no default): Non-optional user defined
function that will be used to calculate distances between voronoi centers and data points
- mapper_func (function, optional): Optional user defined function for building bin
mappers for more complicated binning schemes e.g. embedding the voronoi binning in a portion of the state space. If not defined the plugin will build a VoronoiBinMapper with the information it has.
- dfunc()
Distance function to be used by the plugin. This function will be used to calculate the distance between each point.
- get_dfunc_method(plugin_config)
- get_mapper_func(plugin_config)
- get_initial_centers()
This function pulls from the centers from either the previous bin mapper or uses the definition from the system to calculate the number of centers
- update_bin_mapper()
Update the bin_mapper using the current set of voronoi centers
- update_centers(iter_group)
Update the set of Voronoi centers according to Zhang 2010, J Chem Phys, 132. A short description of the algorithm can be found in the text:
1) First reference structure is chosen randomly from the first set of given structure 2) Given a set of n reference structures, for each configuration in the iteration the distances to each reference structure is calculated and the minimum distance is found 3) The configuration with the minimum distance is selected as the next reference
- prepare_new_iteration()
westpa.westext.stringmethod package
Submodules
westpa.westext.stringmethod.fourier_fitting module
westpa.westext.stringmethod.string_driver module
westpa.westext.stringmethod.string_method module
Module contents
westpa.westext.hamsm_restarting package
Description
This plugin leverages haMSM analysis [1] to provide simulation post-analysis. This post-analysis can be used on its own, or can be used to initialize and run new WESTPA simulations using structures in the haMSM’s best estimate of steady-state as described in [2], which may accelerate convergence to steady-state.
haMSM analysis is performed using the msm_we library.
Sample files necessary to run the restarting plugin (as described below) can be found in the WESTPA GitHub Repo.
Usage
Configuration
west.cfg
This plugin requires the following section in west.cfg
(or whatever your WE configuration file is named):
west:
plugins:
- plugin: westpa.westext.hamsm_restarting.restart_driver.RestartDriver
n_restarts: 0 # Number of restarts to perform
n_runs: 5 # Number of runs within each restart
n_restarts_to_use: 0.5 # Amount of prior restarts' data to use. -1, a decimal in (0,1), or an integer. Details below.
extension_iters: 5 # Number of iterations to continue runs for, if target is not reached by first restart period
coord_len: 2 # Length of pcoords returned
initialization_file: restart_initialization.json # JSON describing w_run parameters for new runs
ref_pdb_file: common_files/bstate.pdb # File containing reference structure/topology
model_name: NaClFlux # Name for msm_we model
n_clusters: 2 # Number of clusters in haMSM building
we_folder: . # Should point to the same directory as WEST_SIM_ROOT
target_pcoord_bounds: [[-inf, 2.60]] # Progress coordinate boundaries for the target state
basis_pcoord_bounds: [[12.0, inf]] # Progress coordinate boundaries for the basis state
tau: 5e-13 # Resampling time, i.e. length of a WE iteration in physical units
pcoord_ndim0: 1 # Dimensionality of progress coordinate
dim_reduce_method: pca # Dimensionality reduction scheme, either "pca", "vamp", or "none"
parent_traj_filename: parent.xml # Name of parent file in each segment
child_traj_filename: seg.xml # Name of child file in each segment
user_functions: westpa_scripts/restart_overrides.py # Python file defining coordinate processing
struct_filetype: mdtraj.formats.PDBTrajectoryFile # Filetype for output start-structures
debug: False # Optional, defaults to False. If true, enables debug-mode logging.
streaming: True # Does clustering in a streaming fashion, versus trying to load all coords in memory
n_cpus: 1 # Number of CPUs to use for parallel calculations
Some sample parameters are provided in the above, but of course should be modified to your specific system.
Note about restarts_to_use
: restarts_to_use
can be specified in a few different ways. A value of -1
means
to use all available data. A decimal 0 < restarts_to_use < 1
will use the last restarts_to_use * current_restart
iterations of data – so, for example, set to 0.5 to use the last half of the data, or 0.75 to use the last 3/4. Finally,
and integer value will just use the last restarts_to_use
iterations.
Note that ref_pdb_file can be any filetype supported by msm_we.initialize()
’s structure loading.
At the time of writing, this is limited to PDB, however that is planned to be extended.
Also at the time of writing, that’s only used to set model.nAtoms, so if you’re using some weird topology that’s
unsupported, you should be able to scrap that and manually set nAtoms on the object.
Also in this file, west.data.data_refs.basis_state
MUST point to
$WEST_SIM_ROOT/{basis_state.auxref}
and not a subdirectory if restarts are being used.
This is because when the plugin initiates a restart, start_state
references in $WEST_SIM_ROOT/restartXX/start_states.txt
are set relative to $WEST_SIM_ROOT
. All basis/start
state references are defined relative to west.data.data_refs.basis_state
, so if that points to a subdirectory of
$WEST_SIM_ROOT
, those paths will not be accurate.
Running
Once configured, just run your WESTPA simulation normally with w_run
, and the plugin will automatically handle performing restarts, and extensions if necessary.
Extensions
To be clear: these are extensions in the sense of extending a simulation to be longer – not in the sense of “an extension to the WESTPA software package”!
Running with extension_iters
greater than 0 will enable extensions before the first restart if the target
state is not reached.
This is useful to avoid restarting when you don’t yet have structures spanning all the way from your basis to target.
At the time of writing, it’s not yet clear whether restarting from “incomplete” WE runs like this will help or hinder
the total number of iterations it takes to reach the target.
Extensions are simple and work as follows: before doing the first restart, after all runs are complete, the output
WESTPA h5 files are scanned to see if any recycling has occurred.
If it hasn’t, then each run is extended by extension_iters
iterations.
restart_initialization.json
{
"bstates":["start,1,bstates/bstate.pdb"],
"tstates":["bound,2.6"],
"bstate-file":"bstates/bstates.txt",
"tstate-file" :"tstate.file",
"segs-per-state": 1
}
It is not necessary to specify both in-line states and a state-file for each, but that is shown in the sample for completeness.
It is important that bstates
and tstates
are lists of strings, and not just strings, even if only one
bstate/tstate is being used!
With n_runs > 1
, before doing any restart, multiple independent runs are performed. However, before the first
restart (this applies if no restarts are performed as well), the plugin has no way of accessing the parameters that
were initially passed to w_init
and w_run
.
Therefore, it is necessary to store those parameters in a file, so the plugin can read them and initiate subsequent runs.
After the first restart is performed, the plugin writes this file itself, so it is only necessary to manually configure for that first set of runs.
Featurization overrides
import numpy as np
import mdtraj as md
def processCoordinates(self, coords):
log.debug("Processing coordinates")
if self.dimReduceMethod == "none":
nC = np.shape(coords)
nC = nC[0]
ndim = 3 * self.nAtoms
data = coords.reshape(nC, 3 * self.nAtoms)
return data
if self.dimReduceMethod == "pca" or self.dimReduceMethod == "vamp":
### NaCl RMSD dimensionality reduction
log.warning("Hardcoded selection: Doing dim reduction for Na, Cl. This is only for testing!")
indNA = self.reference_structure.topology.select("element Na")
indCL = self.reference_structure.topology.select("element Cl")
diff = np.subtract(coords[:, indNA], coords[:, indCL])
dist = np.array(np.sqrt(
np.mean(
np.power(
diff,
2)
, axis=-1)
))
return dist
This is the file whose path is provided in the configuration file in plugin.user_functions
, and must be a Python
file defining a function named processCoordinates(self, coords)
which takes a numpy array of coordinates,
featurizes it, and returns the numpy array of feature-coordinates.
This is left to be user-provided because whatever featurization you do will be system-specific. The provided function
is monkey-patched into the msm_we.modelWE
class.
An example is provided above, which does a simple RMSD coordinate reduction for the NaCl association tutorial system.
Doing only post-analysis
If you want to ONLY use this for haMSM post-analysis, and not restarting, just set n_restarts: 0
in the configuration.
Work manager for restarting
If you’re using some parallelism (which you should), and you’re using the plugin to do restarts or multiple runs,
then your choice of work manager can be important.
This plugin handles starting new WESTPA runs using the Python API.
The process work manager, by default, uses fork
to start new workers which seems to eventually causes
memory issues, since fork
passes the entire contents of the parent to each child.
Switching the spawn method to forkserver
or spawn
may introduce other issues.
Using the ZMQ work manager works well. The MPI work manager should also work well, though is untested. Both of these handle starting new workers in a more efficient way, without copying the full state of the parent.
Continuing a failed run
The restarting plugin has a few different things it expects to find when it runs. Crashes during the WE run should not affect this. However, if the plugin itself crashes while running, these may be left in a weird state.
If the plugin crashes while running, make sure:
restart.dat
contains the correct entries.restarts_completed
is the number of restarts successfully completed, and same forruns_completed
within that restart.restart_initialization.json
is pointing to the correct restart
It may help to w_truncate
the very last iteration and allow WESTPA to re-do it.
Potential Pitfalls/Troubleshooting
Basis state calculation may take a LONG time with a large number of start-states. A simple RMSD calculation using cpptraj and 500,000 start-states took over 6 hours. Reducing the number of runs used through
n_restarts_to_use
will ameliorate this.If
restart_driver.prepare_coordinates()
has written a coordinate for an iteration, subsequent runs will NOT overwrite it, and will skip it.In general: verify that msm_we is installed
Verify that
restart_initialization.json
has been correctly setThis plugin does not yet attempt to resolve environment variables in the config, so things like say, $WEST_SIM_ROOT, will be interpreted literally in paths
References
[1] Suárez, E., Adelman, J. L. & Zuckerman, D. M. Accurate Estimation of Protein Folding and Unfolding Times: Beyond Markov State Models. J Chem Theory Comput 12, 3473–3481 (2016).
[2] Copperman, J. & Zuckerman, D. M. Accelerated Estimation of Long-Timescale Kinetics from Weighted Ensemble Simulation via Non-Markovian “Microbin” Analysis. J Chem Theory Comput 16, 6763–6775 (2020).
Depreciated
westpa.westext.weed package
Submodules
westpa.westext.weed.BinCluster module
westpa.westext.weed.ProbAdjustEquil module
- westpa.westext.weed.ProbAdjustEquil.probAdjustEquil(binProb, rates, uncert, threshold=0.0, fullCalcClust=False, fullCalcBins=False)
This function adjusts bin pops in binProb using rates and uncert matrices fullCalcBins –> True for weighted avg, False for simple calc fullCalcClust –> True for weighted avg, False for simple calc threshold –> minimum weight (relative to max) for another value to be averaged
only matters if fullCalcBins == True (or later perhaps if fullCalcClust == True)
westpa.westext.weed.UncertMath module
- class westpa.westext.weed.UncertMath.UncertContainer(vals, vals_dmin, vals_dmax, mask=False)
Bases:
object
Container to hold uncertainty measurements. Data is convert to np masked arrays to avoid possible numerical problems
- transpose()
- recip()
- update_mask()
- concatenate(value, axis=0)
Concatentate UncertContainer value to self. Assumes that if dimensions of self and value do not match, to add a np.newaxis along axis of value
- weighted_average(axis=0, expaxis=None)
Calculate weighted average of data along axis after optionally inserting a new dimension into the shape array at position expaxis
westpa.westext.weed.weed_driver module
- westpa.westext.weed.weed_driver.check_bool(value, action='warn')
Check that the given
value
is boolean in type. If not, either raise a warning (ifaction=='warn'
) or an exception (action=='raise'
).
- class westpa.westext.weed.weed_driver.RateAverager(bin_mapper, system=None, data_manager=None, work_manager=None)
Bases:
object
Calculate bin-to-bin kinetic properties (fluxes, rates, populations) at 1-tau resolution
- extract_data(iter_indices)
Extract data from the data_manger and place in dict mirroring the same underlying layout.
- task_generator(iter_start, iter_stop, block_size)
- calculate(iter_start=None, iter_stop=None, n_blocks=1, queue_size=1)
Read the HDF5 file and collect flux matrices and population vectors for each bin for each iteration in the range [iter_start, iter_stop). Break the calculation into n_blocks blocks. If the calculation is broken up into more than one block, queue_size specifies the maxmimum number of tasks in the work queue.
- westpa.westext.weed.weed_driver.probAdjustEquil(binProb, rates, uncert, threshold=0.0, fullCalcClust=False, fullCalcBins=False)
This function adjusts bin pops in binProb using rates and uncert matrices fullCalcBins –> True for weighted avg, False for simple calc fullCalcClust –> True for weighted avg, False for simple calc threshold –> minimum weight (relative to max) for another value to be averaged
only matters if fullCalcBins == True (or later perhaps if fullCalcClust == True)
- westpa.westext.weed.weed_driver.bins_from_yaml_dict(bin_dict)
Module contents
westext.weed – Support for weighted ensemble equilibrium dynamics
Initial code by Dan Zuckerman (May 2011), integration by Matt Zwier, and testing by Carsen Stringer. Re-factoring and optimization of probability adjustment routines by Joshua L. Adelman (January 2012).
- westpa.westext.weed.probAdjustEquil(binProb, rates, uncert, threshold=0.0, fullCalcClust=False, fullCalcBins=False)
This function adjusts bin pops in binProb using rates and uncert matrices fullCalcBins –> True for weighted avg, False for simple calc fullCalcClust –> True for weighted avg, False for simple calc threshold –> minimum weight (relative to max) for another value to be averaged
only matters if fullCalcBins == True (or later perhaps if fullCalcClust == True)
westpa.westext.wess package
Submodules
westpa.westext.wess.ProbAdjust module
- westpa.westext.wess.ProbAdjust.solve_steady_state(T, U, target_bins_index)
- westpa.westext.wess.ProbAdjust.prob_adjust(binprob, rates, uncert, oldindex, targets=[])
westpa.westext.wess.wess_driver module
- westpa.westext.wess.wess_driver.check_bool(value, action='warn')
Check that the given
value
is boolean in type. If not, either raise a warning (ifaction=='warn'
) or an exception (action=='raise'
).
- class westpa.westext.wess.wess_driver.RateAverager(bin_mapper, system=None, data_manager=None, work_manager=None)
Bases:
object
Calculate bin-to-bin kinetic properties (fluxes, rates, populations) at 1-tau resolution
- extract_data(iter_indices)
Extract data from the data_manger and place in dict mirroring the same underlying layout.
- task_generator(iter_start, iter_stop, block_size)
- calculate(iter_start=None, iter_stop=None, n_blocks=1, queue_size=1)
Read the HDF5 file and collect flux matrices and population vectors for each bin for each iteration in the range [iter_start, iter_stop). Break the calculation into n_blocks blocks. If the calculation is broken up into more than one block, queue_size specifies the maxmimum number of tasks in the work queue.
- westpa.westext.wess.wess_driver.prob_adjust(binprob, rates, uncert, oldindex, targets=[])
- westpa.westext.wess.wess_driver.bins_from_yaml_dict(bin_dict)
- westpa.westext.wess.wess_driver.reduce_array(Aij)
Remove empty rows and columns from an array Aij and return the reduced array Bij and the list of non-empty states
Module contents
- westpa.westext.wess.prob_adjust(binprob, rates, uncert, oldindex, targets=[])
Module contents
westpa.analysis package
This subpackage provides an API to facilitate the analysis of WESTPA
simulation data. Its core abstraction is the Run
class.
A Run
instance provides a read-only view of a WEST HDF5 (“west.h5”) file.
API reference: https://westpa.readthedocs.io/en/latest/documentation/analysis/
How To
Open a run:
>>> from westpa.analysis import Run
>>> run = Run.open('west.h5')
>>> run
<WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>
Iterate over iterations and walkers:
>>> for iteration in run:
... for walker in iteration:
... pass
...
Access a particular iteration:
>>> iteration = run.iteration(10)
>>> iteration
Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Access a particular walker:
>>> walker = iteration.walker(4)
>>> walker
Walker(4, Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Get the weight and progress coordinate values of a walker:
>>> walker.weight
9.876543209876543e-06
>>> walker.pcoords
array([[3.1283207],
[3.073721 ],
[2.959221 ],
[2.6756208],
[2.7888207]], dtype=float32)
Get the parent and children of a walker:
>>> walker.parent
Walker(2, Iteration(9, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
>>> for child in walker.children:
... print(child)
...
Walker(0, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(1, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(2, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(3, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(4, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Trace the ancestry of a walker:
>>> trace = walker.trace()
>>> trace
Trace(Walker(4, Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>)))
>>> for walker in trace:
... print(walker)
...
Walker(1, Iteration(1, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(4, Iteration(2, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(5, Iteration(3, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(6, Iteration(4, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(9, Iteration(5, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(8, Iteration(6, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(8, Iteration(7, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(13, Iteration(8, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(2, Iteration(9, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(4, Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Close a run (and its underlying HDF5 file):
>>> run.close()
>>> run
<Closed WESTPA Run at 0x7fcaf8f0d5b0>
>>> run.h5file
<Closed HDF5 file>
Retrieving Trajectories
Built-in Reader
MD trajectory data stored in an identical manner as in the
Basic NaCl tutorial
may be retrieved using the built-in BasicMDTrajectory
reader with its
default settings:
>>> from westpa.analysis import BasicMDTrajectory
>>> trajectory = BasicMDTrajectory()
Here trajectory
is a callable object that takes either a Walker
or
a Trace
instance as input and returns an
MDTraj Trajectory:
>>> traj = trajectory(walker)
>>> traj
<mdtraj.Trajectory with 5 frames, 33001 atoms, 6625 residues, and unitcells at 0x7fcae484ad00>
>>> traj = trajectory(trace)
>>> traj
<mdtraj.Trajectory with 41 frames, 33001 atoms, 6625 residues, and unitcells at 0x7fcae487c790>
Minor variations of the “basic” trajectory storage protocol (e.g., use of
different file formats) can be handled by changing the parameters of the
BasicMDTrajectory
reader. For example, suppose that instead of storing
the coordinate and topology data for trajectory segments in separate
files (“seg.dcd” and “bstate.pdb”), we store them together in a
MDTraj HDF5 trajectory file
(“seg.h5”). This change can be accommodated by explicitly setting the
traj_ext
and top
parameters of the trajectory reader:
>>> trajectory = BasicMDTrajectory(traj_ext='.h5', top=None)
Trajectories that are saved with the HDF5 Framework can use HDF5MDTrajectory
reader instead.
Custom Readers
For users requiring greater flexibility, custom trajectory readers can be
implemented using the westpa.analysis.Trajectory
class. Implementing
a custom reader requires two ingredients:
A function for retrieving individual trajectory segments. The function must take a
Walker
instance as its first argument and return a sequence (e.g., a list, NumPy array, or MDTraj Trajectory) representing the trajectory of the walker. Moreover, it must accept a Boolean keyword argumentinclude_initpoint
, which specifies whether the returned trajectory includes its initial point.A function for concatenating trajectory segments. A default implementation is provided by the
concatenate()
function in thewestpa.analysis.trajectories
module.
westpa.analysis.core module
- class westpa.analysis.core.Run(h5filename='west.h5')
A read-only view of a WESTPA simulation run.
- Parameters:
h5filename (str or file-like object, default 'west.h5') – Pathname or stream of a main WESTPA HDF5 data file.
- classmethod open(h5filename='west.h5')
Alternate constructor.
- Parameters:
h5filename (str or file-like object, default 'west.h5') – Pathname or stream of a main WESTPA HDF5 data file.
- close()
Close the Run instance by closing the underlying WESTPA HDF5 file.
- property closed
Whether the Run instance is closed.
- Type:
bool
- property summary
Summary data by iteration.
- Type:
pd.DataFrame
- property num_iterations
Number of completed iterations.
- Type:
int
- property num_walkers
Total number of walkers.
- Type:
int
- property num_segments
Total number of trajectory segments (alias self.num_walkers).
- Type:
int
- class westpa.analysis.core.Iteration(number, run)
An iteration of a WESTPA simulation.
- Parameters:
number (int) – Iteration number (1-based).
run (Run) – Simulation run to which the iteration belongs.
- property h5group
HDF5 group containing the iteration data.
- Type:
h5py.Group
- property summary
Iteration summary.
- Type:
pd.DataFrame
- property segment_summaries
Segment summary data for the iteration.
- Type:
pd.DataFrame
- property pcoords
Progress coordinate snaphots of each walker.
- Type:
3D ndarray
- property weights
Statistical weight of each walker.
- Type:
1D ndarray
- property bin_target_counts
Target count for each bin.
- Type:
1D ndarray, dtype=uint64
- property num_bins
Number of bins.
- Type:
int
- property num_walkers
Number of walkers in the iteration.
- Type:
int
- property num_segments
Number of trajectory segments (alias self.num_walkers).
- Type:
int
- property auxiliary_data
Auxiliary data stored for the iteration.
- Type:
h5py.Group or None
- property basis_state_summaries
Basis state summary data.
- Type:
pd.DataFrame
- property basis_state_pcoords
Progress coordinates of each basis state.
- Type:
2D ndarray
- property basis_states
Basis states in use for the iteration.
- Type:
list[BasisState]
- property has_target_states
Whether target (sink) states are defined for this iteration.
- Type:
bool
- property target_state_summaries
Target state summary data.
- Type:
pd.DataFrame or None
- property target_state_pcoords
Progress coordinates of each target state.
- Type:
2D ndarray or None
- property target_states
Target states in use for the iteration.
- Type:
list[TargetState]
- bin(index)
Return the bin with the given index.
- Parameters:
index (int) – Bin index (0-based).
- Returns:
The bin indexed by index.
- Return type:
- walker(index)
Return the walker with the given index.
- Parameters:
index (int) – Walker index (0-based).
- Returns:
The walker indexed by index.
- Return type:
- basis_state(index)
Return the basis state with the given index.
- Parameters:
index (int) – Basis state index (0-based).
- Returns:
The basis state indexed by index.
- Return type:
- target_state(index)
Return the target state with the given index.
- Parameters:
index (int) – Target state index (0-based).
- Returns:
The target state indexed by index.
- Return type:
- class westpa.analysis.core.Walker(index, iteration)
A walker in an iteration of a WESTPA simulation.
- Parameters:
index (int) – Walker index (0-based).
iteration (Iteration) – Iteration to which the walker belongs.
- property weight
Statistical weight of the walker.
- Type:
float64
- property pcoords
Progress coordinate snapshots.
- Type:
2D ndarray
- property num_snapshots
Number of snapshots.
- Type:
int
- property segment_summary
Segment summary data.
- Type:
pd.Series
- property parent
The parent of the walker.
- Type:
- property recycled
True if the walker stopped in the sink, False otherwise.
- Type:
bool
- property initial
True if the parent of the walker is an initial state, False otherwise.
- Type:
bool
- property auxiliary_data
Auxiliary data for the walker.
- Type:
dict
- class westpa.analysis.core.BinUnion(indices, mapper)
A (disjoint) union of bins defined by a common bin mapper.
- Parameters:
indices (iterable of int) – The indices of the bins comprising the union.
mapper (BinMapper) – The bin mapper defining the bins.
- union(*others)
Return the union of the bin union and all others.
- class westpa.analysis.core.Bin(index, mapper)
A bin defined by a bin mapper.
- Parameters:
index (int) – The index of the bin.
mapper (BinMapper) – The bin mapper defining the bin.
- class westpa.analysis.core.Trace(walker, source=None, max_length=None)
A trace of a walker’s ancestry.
- Parameters:
walker (Walker) – The terminal walker.
source (Bin, BinUnion, or collections.abc.Container, optional) – A source (macro)state, specified as a container object whose
__contains__()
method is the indicator function for the corresponding subset of progress coordinate space. The trace is stopped upon encountering a walker that stopped in source.max_length (int, optional) – The maximum number of walkers in the trace.
westpa.analysis.trajectories module
- class westpa.analysis.trajectories.Trajectory(fget=None, *, fconcat=None)
A callable that returns the trajectory of a walker or trace.
- Parameters:
fget (callable) – Function for retrieving a single trajectory segment. Must take a
Walker
instance as its first argument and accept a boolean keyword argument include_initpoint. The function should return a sequence (e.g., a list or ndarray) representing the trajectory of the walker. If include_initpoint is True, the trajectory segment should include its initial point. Otherwise, the trajectory segment should exclude its initial point.fconcat (callable, optional) – Function for concatenating trajectory segments. Must take a sequence of trajectory segments as input and return their concatenation. The default concatenation function is
concatenate()
.
- property segment_collector
Segment retrieval manager.
- Type:
- property fget
Function for getting trajectory segments.
- Type:
callable
- property fconcat
Function for concatenating trajectory segments.
- Type:
callable
- class westpa.analysis.trajectories.SegmentCollector(trajectory, use_threads=False, max_workers=None, show_progress=False)
An object that manages the retrieval of trajectory segments.
- Parameters:
trajectory (Trajectory) – The trajectory to which the segment collector is attached.
use_threads (bool, default False) – Whether to use a pool of threads to retrieve trajectory segments asynchronously. Setting this parameter to True may be may be useful when segment retrieval is an I/O bound task.
max_workers (int, optional) – Maximum number of threads to use. The default value is specified in the ThreadPoolExecutor documentation.
show_progress (bool, default False) – Whether to show a progress bar when retrieving multiple segments.
- get_segments(walkers, initpoint_mask=None, **kwargs)
Retrieve the trajectories of multiple walkers.
- Parameters:
walkers (sequence of Walker) – The walkers for which to retrieve trajectories.
initpoint_mask (sequence of bool, optional) – A Boolean mask indicating whether each trajectory segment should include (True) or exclude (False) its initial point. Default is all True.
- Returns:
The trajectory of each walker.
- Return type:
list of sequences
- class westpa.analysis.trajectories.BasicMDTrajectory(top='bstate.pdb', traj_ext='.dcd', state_ext='.xml', sim_root='.')
Trajectory reader for MD trajectories stored as in the Basic Tutorial.
- Parameters:
top (str or mdtraj.Topology, default 'bstate.pdb')
traj_ext (str, default '.dcd')
state_ext (str, default '.xml')
sim_root (str, default '.')
- class westpa.analysis.trajectories.HDF5MDTrajectory
Trajectory reader for MD trajectories stored by the HDF5 framework.
- westpa.analysis.trajectories.concatenate(segments)
Return the concatenation of a sequence of trajectory segments.
- Parameters:
segments (sequence of sequences) – A sequence of trajectory segments.
- Returns:
The concatenation of segments.
- Return type:
sequence
westpa.analysis.statistics module
- westpa.analysis.statistics.time_average(observable, iterations)
Compute the time average of an observable.
- Parameters:
- Returns:
The time average of observable over iterations.
- Return type:
ArrayLike
HDF5 File Schema
WESTPA stores all of its simulation data in the cross-platform, self-describing HDF5 file format. This file format can be read and written by a variety of languages and toolkits, including C/C++, Fortran, Python, Java, and Matlab so that analysis of weighted ensemble simulations is not tied to using the WESTPA framework. HDF5 files are organized like a filesystem, where arbitrarily-nested groups (i.e. directories) are used to organize datasets (i.e. files). The excellent HDFView program may be used to explore WEST data files.
The canonical file format reference for a given version of the WEST code is described in src/west/data_manager.py.
Overall structure
/
#ibstates/
index
naming
bstate_index
bstate_pcoord
istate_index
istate_pcoord
#tstates/
index
bin_topologies/
index
pickles
iterations/
iter_XXXXXXXX/\|iter_XXXXXXXX/
auxdata/
bin_target_counts
ibstates/
bstate_index
bstate_pcoord
istate_index
istate_pcoord
pcoord
seg_index
wtgraph
...
summary
The root group (/)
The root of the WEST HDF5 file contains the following entries (where a trailing “/” denotes a group):
Name |
Type |
Description |
---|---|---|
ibstates/ |
Group |
Initial and basis states for this simulation |
tstates/ |
Group |
Target (recycling) states for this simulation; may be empty |
bin_topologies/ |
Group |
Data pertaining to the binning scheme used in each iteration |
iterations/ |
Group |
Iteration data |
summary |
Dataset (1-dimensional, compound) |
Summary data by iteration |
The iteration summary table (/summary)
Field |
Description |
---|---|
n_particles |
the total number of walkers in this iteration |
norm |
total probability, for stability monitoring |
min_bin_prob |
smallest probability contained in a bin |
max_bin_prob |
largest probability contained in a bin |
min_seg_prob |
smallest probability carried by a walker |
max_seg_prob |
largest probability carried by a walker |
cputime |
total CPU time (in seconds) spent on propagation for this iteration |
walltime |
total wallclock time (in seconds) spent on this iteration |
binhash |
a hex string identifying the binning used in this iteration |
Per iteration data (/iterations/iter_XXXXXXXX)
Data for each iteration is stored in its own group, named according to the
iteration number and zero-padded out to 8 digits, as in
/iterations/iter_00000001
for iteration 1. This is done solely for
convenience in dealing with the data in external utilities that sort output by
group name lexicographically. The field width is in fact configurable via the
iter_prec
configuration entry under data
section of the WESTPA
configuration file.
The HDF5 group for each iteration contains the following elements:
Name |
Type |
Description |
---|---|---|
auxdata/ |
Group |
All user-defined auxiliary data0 sets |
bin_target_counts |
Dataset (1-dimensional) |
The per-bin target count for the iteration |
ibstates/ |
Group |
Initial and basis state data for the iteration |
pcoord |
Dataset (3-dimensional) |
Progress coordinate data for the iteration stored as a (num of segments, pcoord_len, pcoord_ndim) array |
seg_index |
Dataset (1-dimensional, compound) |
Summary data for each segment |
wtgraph |
Dataset (1-dimensional) |
The segment summary table (/iterations/iter_XXXXXXXX/seg_index)
Field |
Description |
---|---|
weight |
Segment weight |
parent_id |
Index of parent |
wtg_n_parents |
|
wtg_offset |
|
cputime |
Total cpu time required to run the segment |
walltime |
Total walltime required to run the segment |
endpoint_type |
|
status |
Bin Topologies group (/bin_topologies)
Bin topologies used during a WE simulation are stored as a unique hash
identifier and a serialized BinMapper
object in python pickle format. This group contains
two datasets:
index
: Compound array containing the bin hash and pickle lengthpickle
: The pickledBinMapper
objects for each unique mapper stored in a (num unique mappers, max pickled size) array
Overview
Style Guide
Preface
The WESTPA documentation should help the user to understand how WESTPA works and how to use it. To aid in effective communication, a number of guidelines appear below.
When writing in the WESTPA documentation, please be:
Correct
Clear
Consistent
Concise
Articles in this documentation should follow the guidelines on this page. However, there may be cases when following these guidelines will make an article confusing: when in doubt, use your best judgment and ask for the opinions of those around you.
Style and Usage
Acronyms and abbreviations
Software documentation often involves extensive use of acronyms and abbreviations.
Acronym: A word formed from the initial letter or letters of each or most of the parts of a compound term
Abbreviation: A shortened form of a written word or name that is used in place of the full word or name
Define non-standard acronyms and abbreviations on their first use by using the full-length term, followed by the acronym or abbreviation in parentheses.
A potential of mean force (PMF) diagram may aid the user in visuallizing the energy landscape of the simulation.
Only use acronyms and abbreviations when they make an idea more clear than spelling out the full term. Consider clarity from the point of view of a new user who is intelligent but may have little experience with computers.
Correct: The WESTPA wiki supports HyperText Markup Language (HTML). For example, the user may use HTML tags to give text special formatting. However, be sure to test that the HTML tag gives the desired effect by previewing edits before saving.
Avoid: The WESTPA wiki supports HyperText Markup Language. For example, the user may use HyperText Markup Language tags to give text special formatting. However, be sure to test that the HyperText Markup Language tag gives the desired effect by previewing edits before saving.
Avoid: For each iter, make sure to return the pcoord and any auxdata.
Use all capital letters for abbreviating file types. File extensions should be lowercase.
HDF5, PNG, MP4, GRO, XTC
west.h5, bound.png, unfolding.mp4, protein.gro, segment.xtc
Provide pronunciations for acronyms that may be difficult to sound out.
Do not use periods in acronyms and abbreviations except where it is customary:
Correct: HTML, U.S.
Avoid: H.T.M.L., US
Capitalization
Capitalize at the beginning of each sentence.
Do not capitalize after a semicolon.
Do not capitalize after a colon, unless multiple sentences follow the colon.
In this case, capitalize each sentence.
Preserve the capitalization of computer language elements (commands,
utilities, variables, modules, classes, and arguments).
Capitilize generic Python variables according to the
PEP 0008 Python Style Guide. For example, generic class names should follow the CapWords convention, such as
GenericClass
.
Contractions
Do not use contractions. Contractions are a shortened version of word characterized by the omission of internal letters.
Avoid: can’t, don’t, shouldn’t
Possessive nouns are not contractions. Use possessive nouns freely.
Internationalization
Use short sentences (less than 25 words). Although we do not maintain WESTPA documentation in languages other than English, some users may use automatic translation programs. These programs function best with short sentences.
Do not use technical terms where a common term would be equally or more clear.
Use multiple simple sentences in place of a single complicated sentence.
Italics
Use italics (surround the word with * * on each side) to highlight words that are not part of a sentence’s normal grammer.
Correct: The word istates refers to the initial states that WESTPA uses to begin trajectories.
Non-English words
Avoid Latin words and abbreviations.
Avoid: etc., et cetera, e.g., i.e.
Specially formatted characters
Never begin a sentence with a specially formatted character. This includes abbreviations, variable names, and anything else this guide instructs to use with special tags. Sentences may begin with WESTPA.
Correct: The program
ls
allows the user to see the contents of a directory.Avoid:
ls
allows the user to see the contents of a directory.Use the word and rather than an
&
ampersand .When a special character has a unique meaning to a program, first use the character surrounded by `` tags and then spell it out.
Correct: Append an
&
ampersand to a command to let it run in the background.Avoid: Append an “&” to a command… Append an
&
to a command… Append an ampersand to a command…There are many names for the
#
hash mark, including hash tag, number sign, pound sign, and octothorpe. Refer to this symbol as a “hash mark”.
Subject
Refer to the end WESTPA user as the user in software documentation.
Correct: The user should use the
processes
work manager to run segments in parallel on a single node.Refer to the end WESTPA user as you in tutorials (you is the implied subject of commands). It is also acceptable to use personal pronouns such as we and our. Be consistent within the tutorial.
Correct: You should have two files in this directory, named
system.py
andwest.cfg
.
Tense
Use should to specify proper usage.
Correct: The user should run
w_truncate -n <var>iter</var>
to remove iterations after and including iter from the HDF5 file specified in the WESTPA configuration file.Use will to specify expected results and output.
Correct: WESTPA will create a HDF5 file when the user runs
w_init
.
Voice
Use active voice. Passive voice can obscure a sentence and add unnecessary words.
Correct: WESTPA will return an error if the sum of the weights of segments does not equal one.
Avoid: An error will be returned if the sum of the weights of segments does not equal one.
Weighted ensemble
Refer to weighted ensemble in all lowercase, unless at the beginning of a sentence. Do not hyphenate.
Correct: WESTPA is an implementation of the weighted ensemble algorithm.
Avoid: WESTPA is an implementation of the weighted-ensemble algorithm.
Avoid: WESTPA is an implementation of the Weighted Ensemble algorithm.
WESTPA
Refer to WESTPA in all capitals. Do not use bold, italics, or other special formatting except when another guideline from this style guide applies.
Correct: Install the WESTPA software package.
The word WESTPA may refer to the software package or a entity of running software.
Correct: WESTPA includes a number of analysis utilities.
Correct: WESTPA will return an error if the user does not supply a configuration file.
Computer Language Elements
Classes, modules, and libraries
Display class names in fixed-width font using the
``
tag.Correct:
WESTPropagator
Correct: The
numpy
library provides access to various low-level mathematical and scientific calculation routines.Generic class names should be relevant to the properties of the class; do not use foo or bar
class UserDefinedBinMapper(RectilinearBinMapper)
Methods and commands
Refer to a method by its name without parentheses, and without prepending the name of its class. Display methods in fixed-width font using the
``
tag.Correct: the
arange
method of thenumpy
libraryAvoid: the
arange()
method of thenumpy
libraryAvoid: the
numpy.arange
methodWhen referring to the arguments that a method expects, mention the method without arguments first, and then use the method’s name followed by parenthesis and arguments.
Correct: WESTPA calls the
assign
method as assign(coords, mask=None, output=None)Never use a method or command as a verb.
Correct: Run
cd
to change the current working directory.Avoid:
cd
into the main simulation directory.
Programming languages
Some programming languages are both a language and a command. When referring to the language, capitalize the word and use standard font. When referring to the command, preserve capitalization as it would appear in a terminal and use the
``
tag.Using WESTPA requires some knowledge of Python.
Run
python
to launch an interactive session.The Bash shell provides some handy capabilities, such as wildcard matching.
Use
bash
to runexample.sh
.
Scripts
Use the
.. code-block::
directive for short scripts. Options are available for some languages, such as.. code-block:: bash
and.. code-block:: python
.
#!/bin/bash
# This is a generic Bash script.
BASHVAR="Hello, world!"
echo $BASHVAR
#!/usr/bin/env python
# This is a generic Python script.
def main():
pythonstr = "Hello, world!"
print(pythonstr)
return
if __name__ == "__main__":
main()
Begin a code snippet with a
#!
shebang (yes, this is the real term), followed by the usual path to a program. The line after the shebang should be an ellipsis, followed by lines of code. Use#!/bin/bash
for Bash scripts,#!/bin/sh
for generic shell scripts, and#!/usr/bin/env python
for Python scripts. For Python code snippets that are not a stand-alone script, place any import commands between the shebang line and ellipsis.
#!/usr/bin/env python
import numpy
...
def some_function(generic_vals):
return 1 + numpy.mean(generic_vals)
Follow the PEP 0008 Python Style Guide for Python scripts.
Indents are four spaces.
For comments, use the
#
hash mark followed by a single space, and then the comment’s text.Break lines after 80 characters.
For Bash scripts, consider following Google’s Shell Style Guide
Indents are two spaces.
Use blank lines to improve readability
Use
; do
and; then
on the same line aswhile
,for
, andif
.Break lines after 80 characters.
For other languages, consider following a logical style guide. At minimum, be consistent.
Variables
Use the fixed-width
``
tag when referring to a variable.the
ndim
attributeWhen explicitly referring to an attribute as well as its class, refer to an attribute as: the
attr
attribute ofGenericClass
, rather thanGenericClass.attr
Use the
$
dollar sign before Bash variables.WESTPA makes the variable
$WEST_BSTATE_DATA_REF
available to new trajectories.
Source Code Management
Documentation Practices
Introduction to Editing the Sphinx Documentation
Documentation for WESTPA is maintained using Sphinx. Docstrings are formatted in the Numpy style, which are converted to ReStructuredText using Sphinx’ Napoleon plugin, a feature included with Sphinx.
Make sure sphinx
and sphinx_rtd_theme
are installed on the system. The settings for the documentation
are specified in /westpa/doc/conf.py
. In order to successfully build the documentation, your system
has to statisfy the minimum environment to install WESTPA.
The documentation may be built locally in the _build
folder by navigating to the doc
folder, and
running:
make html
to prepare an html version or:
make latexpdf
To prepare a pdf. The latter requires latex
to be available.
Uploading to ReadTheDocs
The online copy of WESTPA Sphinx documentation is hosted on ReadtheDocs.
The Sphinx documentations on the main branch are updated whenever the main branch is updated, via a
webhook setup on ReadtheDocs and /westpa/.readthedocs.yml
. The environment used to build the documentation
on the RTD servers are described in /westpa/doc/doc_env.yaml
.
In Cases of Major Revisions in Code Base
Currently, each .rst
file contains pre-written descriptions and autogenerated sections generated
from docstrings via automodule
. In cases where the WESTPA code base has significantly changed,
the structure of the code base can be regenerated into the test
folder by running the
following command in the doc
folder:
sphinx-apidoc -f -o test ../src/westpa
WESTPA Modules API
Binning
Bin assignment for WEST simulations. This module defines “bin mappers” which take vectors of coordinates (or rather, coordinate tuples), and assign each a definite integer value identifying a bin. Critical portions are implemented in a Cython extension module.
A number of pre-defined bin mappers are available here:
RectilinearBinMapper
, for bins divided by N-dimensional grids
FuncBinMapper
, for functions which directly calculate bin assignments for a number of coordinate values. This is best used with C/Cython/Numba functions, or intellegently-tuned numpy-based Python functions.
VectorizingFuncBinMapper
, for functions which calculate a bin assignment for a single coordinate value. This is best used for arbitrary Python functions.
PiecewiseBinMapper
, for using a set of boolean-valued functions, one per bin, to determine assignments. This is likely to be much slower than a FuncBinMapper or VectorizingFuncBinMapper equipped with an appropriate function, and its use is discouraged.
One “super-mapper” is available, for assembling more complex bin spaces from simpler components:
RecursiveBinMapper
, for nesting one set of bins within another.
Users are also free to implement their own mappers. A bin mapper must implement, at
least, an assign(coords, mask=None, output=None)
method, which is responsible
for mapping each of the vector of coordinate tuples coords
to an integer
(np.uint16) indicating a what bin that coordinate tuple falls into. The optional
mask
(a numpy bool array) specifies that some coordinates are to be skipped; this is used,
for instance, by the recursive (nested) bin mapper to minimize the number of calculations
required to definitively assign a coordinate tuple to a bin. Similarly, the optional
output
must be an integer (uint16) array of the same length as coords
, into which
assignments are written. The assign()
function must return a reference to output
.
(This is used to avoid allocating many temporary output arrays in complex binning
scenarios.)
A user-defined bin mapper must also make an nbins
property available, containing
the total number of bins within the mapper.
YAMLCFG
YAML-based configuration files for WESTPA
RC
- class westpa.core._rc.WESTRC
A class, an instance of which is accessible as
westpa.rc
, to handle global issues for WEST-PA code, such as loading modules and plugins, writing output based on verbosity level, adding default command line options, and so on.
WESTPA Tools
WEST
Setup
Defining and Calculating Progress Coordinates
Binning
The Weighted Ensemble method enhances sampling by partitioning the space defined by the progress coordinates into non-overlapping bins. WESTPA provides a number of pre-defined types of bins that the user must parameterize within the system.py file, which are detailed below.
Users are also free to implement their own mappers. A bin mapper must
implement, at least, an assign(coords, mask=None, output=None)
method,
which is responsible for mapping each of the vector of coordinate tuples
coords
to an integer (numpy.uint16
) indicating what bin that coordinate
tuple falls into. The optional mask
(a numpy bool array) specifies that
some coordinates are to be skipped; this is used, for instance, by the
recursive (nested) bin mapper to minimize the number of calculations required
to definitively assign a coordinate tuple to a bin. Similarly, the optional
output
must be an integer (uint16
) array of the same length as
coords
, into which assignments are written. The assign()
function must
return a reference to output
. (This is used to avoid allocating many
temporary output arrays in complex binning scenarios.)
A user-defined bin mapper must also make an nbins
property available,
containing the total number of bins within the mapper.
RectilinearBinMapper
Creates an N-dimensional grid of bins. The Rectilinear bin mapper is initialized by defining a set of bin boundaries:
self.bin_mapper = RectilinearBinMapper(boundaries)
where boundaries
is a list or other iterable containing the bin boundaries
along each dimension. The bin boundaries must be monotonically increasing along
each dimension. It is important to note that a one-dimensional bin space must
still be represented as a list of lists as in the following example::
bounds = [-float('inf'), 0.0, 1.0, 2.0, 3.0, float('inf')]
self.bin_mapper = RectilinearBinMapper([bounds])
A two-dimensional system might look like::
boundaries = [(-1,-0.5,0,0.5,1), (-1,-0.5,0,0.5,1)]
self.bin_mapper = RectilinearBinMapper(boundaries)
where the first tuple in the list defines the boundaries along the first progress coordinate, and the second tuple defines the boundaries along the second. Of course a list of arbitrary dimensions can be defined to create an N-dimensional grid discretizing the progress coordinate space.
VoronoiBinMapper
A one-dimensional mapper which assigns a multidimensional progress coordinate
to the closest center based on a distance metric. The Voronoi bin mapper is
initialized with the following signature within the
WESTSystem.initialize
::
self.bin_mapper = VoronoiBinMapper(dfunc, centers, dfargs=None, dfkwargs=None)
centers
is a(n_centers, pcoord_ndim)
shaped numpy array defining the generators of the Voronoi cellsdfunc
is a method written in Python that returns an(n_centers, )
shaped array containing the distance between a single set of progress coordinates for a segment and all of the centers defining the Voronoi tessellation. It takes the general form::def dfunc(p, centers, *dfargs, **dfkwargs): ... return d
where p
is the progress coordinates of a single segment at one time slice
of shape (pcoord_ndim,)
, centers
is the full set of centers, dfargs
is a tuple or list of positional arguments and dfwargs
is a dictionary of
keyword arguments. The bin mapper’s assign
method then assigns the progress
coordinates to the closest bin (minimum distance). It is the responsibility of
the user to ensure that the distance is calculated using the appropriate
metric.
dfargs
is an optional list or tuple of positional arguments to pass intodfunc
.dfkwargs
is an optional dict of keyword arguments to pass intodfunc
.
FuncBinMapper
A bin mapper that employs a set of user-defined function, which directly calculate bin assignments for a number of coordinate values. The function is responsible for iterating over the entire coordinate set. This is best used with C/Cython/Numba methods, or intellegently-tuned numpy-based Python functions.
The FuncBinMapper
is initialized as::
self.bin_mapper = FuncBinMapper(func, nbins, args=None, kwargs=None)
where func
is the user-defined method to assign coordinates to bins,
nbins
is the number of bins in the partitioning space, and args
and
kwargs
are optional positional and keyword arguments, respectively, that
are passed into func
when it is called.
The user-defined function should have the following form::
def func(coords, mask, output, *args, **kwargs)
....
where the assignments returned in the output
array, which is modified
in-place.
As a contrived example, the following function would assign all segments to bin
0 if the sum of the first two progress coordinates was less than s*0.5
, and
to bin 1 otherwise, where s=1.5
::
def func(coords, mask, output, s):
output[coords[:,0] + coords[:,1] < s*0.5] = 0
output[coords[:,0] + coords[:,1] >= s*0.5] = 1
....
self.bin_mapper = FuncBinMapper(func, 2, args=(1.5,))
VectorizingFuncBinMapper
Like the FuncBinMapper
, the VectorizingFuncBinMapper
uses a
user-defined method to calculate bin assignments. They differ, however, in that
while the user-defined method passed to an instance of the FuncBinMapper
is
responsible for iterating over all coordinate sets passed to it, the function
associated with the VectorizingFuncBinMapper
is evaluated once for each
unmasked coordinate tuple provided. It is not responsible explicitly for
iterating over multiple progress coordinate sets.
The VectorizingFuncBinMapper
is initialized as::
self.bin_mapper = VectorizingFuncBinMapper(func, nbins, args=None, kwargs=None)
where func
is the user-defined method to assign coordinates to bins,
nbins
is the number of bins in the partitioning space, and args
and
kwargs
are optional positional and keyword arguments, respectively, that
are passed into func
when it is called.
The user-defined function should have the following form::
def func(coords, *args, **kwargs)
....
Mirroring the simple example shown for the FuncBinMapper
, the following
should result in the same result for a given set of coordinates. Here segments
would be assigned to bin 0 if the sum of the first two progress coordinates was
less than s*0.5
, and to bin 1 otherwise, where s=1.5
::
def func(coords, s):
if coords[0] + coords[1] < s*0.5:
return 0
else:
return 1
....
self.bin_mapper = VectorizingFuncBinMapper(func, 2, args=(1.5,))
PiecewiseBinMapper
RecursiveBinMapper
The RecursiveBinMapper
is used for assembling more complex bin spaces from
simpler components and nesting one set of bins within another. It is
initialized as::
self.bin_mapper = RecursiveBinMapper(base_mapper, start_index=0)
The base_mapper
is an instance of one of the other bin mappers, and
start_index
is an (optional) offset for indexing the bins. Starting with
the base_mapper
, additional bins can be nested into it using the
add_mapper(mapper, replaces_bin_at)
. This method will replace the bin
containing the coordinate tuple replaces_bin_at
with the mapper specified
by mapper
.
As a simple example consider a bin space in which the base_mapper
assigns a
segment with progress coordinate with values <1 into one bin and >= 1 into
another. Within the former bin, we will nest a second mapper which partitions
progress coordinate space into one bin for progress coordinate values <0.5 and
another for progress coordinates with values >=0.5. The bin space would look
like the following with corresponding code::
'''
0 1 2
+----------------------------+----------------------+
| 0.5 | |
| +-----------+------------+ | |
| | | | | |
| | 1 | 2 | | 0 |
| | | | | |
| | | | | |
| +-----------+------------+ | |prettyprint
+---------------------------------------------------+
'''
def fn1(coords, mask, output):
test = coords[:,0] < 1
output[mask & test] = 0
output[mask & ~test] = 1
def fn2(coords, mask, output):
test = coords[:,0] < 0.5
output[mask & test] = 0
output[mask & ~test] = 1
outer_mapper = FuncBinMapper(fn1,2)
inner_mapper = FuncBinMapper(fn2,2)
rmapper = RecursiveBinMapper(outer_mapper)
rmapper.add_mapper(inner_mapper, [0.5])
Examples of more complicated nesting schemes can be found in the tests for the WESTPA binning apparatus.
Initial/Basis States
A WESTPA simulation is initialized using w_init
with an initial
distribution of replicas generated from a set of basis states. These basis
states are used to generate initial states for new trajectories, either at the
beginning of the simulation or due to recycling. Basis states are specified
when running w_init
either in a file specified with --bstates-from
, or
by one or more --bstate
arguments. If neither --bstates-from
nor at
least one --bstate
argument is provided, then a default basis state of
probability one identified by the state ID zero and label “basis” will be
created (a warning will be printed in this case, to remind you of this
behavior, in case it is not what you wanted).
When using a file passed to w_init
using --bstates-from
, each line in
that file defines a state, and contains a label, the probability, and
optionally a data reference, separated by whitespace, as in::
unbound 1.0
or:
unbound_0 0.6 state0.pdb
unbound_1 0.4 state1.pdb
Basis states can also be supplied at the command line using one or more
--bstate
flags, where the argument matches the format used in the state
file above. The total probability summed over all basis states should equal
unity, however WESTPA will renormalize the distribution if this condition is
not met.
Initial states are the generated from the basis states by optionally applying
some perturbation or modification to the basis state. For example if WESTPA was
being used to simulate ligand binding, one might want to have a basis state
where the ligand was some set distance from the binding partner, and initial
states are generated by randomly orienting the ligand at that distance. When
using the executable propagator, this is done using the script specified under
the gen_istate
section of the executable
configuration. Otherwise, if
defining a custom propagator, the user must override the gen_istate
method
of WESTPropagator
.
When using the executable propagator, the the script specified by
gen_istate
should take the data supplied by the environmental variable
$WEST_BSTATE_DATA_REF
and return the generated initial state to
$WEST_ISTATE_DATA_REF
. If no transform need be performed, the user may
simply copy the data directly without modification. This data will then be
available via $WEST_PARENT_DATA_REF
if $WEST_CURRENT_SEG_INITPOINT_TYPE
is SEG_INITPOINT_NEWTRAJ
.
Target States
WESTPA can be run in a recycling mode in which replicas reaching a target state are removed from the simulation and their weights are assigned to new replicas created from one of the initial states. This mode creates a non-equilibrium steady-state that isolates members of the trajectory ensemble originating in the set of initial states and transitioning to the target states. The flux of probability into the target state is then inversely proportional to the mean first passage time (MFPT) of the transition.
Target states are defined when initializing a WESTPA simulation when calling
w_init
. Target states are specified either in a file specified with
--tstates-from
, or by one or more --tstate
arguments. If neither
--tstates-from
nor at least one --tstate
argument is provided, then an
equilibrium simulation (without any sinks) will be performed.
Target states can be defined using a text file, where each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in::
bound 0.02
for a single target and one-dimensional progress coordinates or::
bound 2.7 0.0
drift 100 50.0
for two targets and a two-dimensional progress coordinate.
The argument associated with --tstate
is a string of the form 'label,
pcoord0 [,pcoord1[,...]]'
, similar to a line in the example target state
definition file above. This argument may be specified more than once, in which
case the given states are appended to the list of target states for the
simulation in the order they appear on the command line, after those that are
specified by --tstates-from
, if any.
WESTPA uses the representative progress coordinate of a target-state and converts the entire bin containing that progress coordinate into a recycling sink.
Propagators
The Executable Propagator
Writing custom propagators
While most users will use the Executable propagator to run dynamics by calling out to an external piece of software, it is possible to write custom propagators that can be used to generate sampling directly through the python interface. This is particularly useful when simulating simple systems, where the overhead of starting up an external program is large compared to the actual cost of computing the trajectory segment. Other use cases might include running sampling with software that has a Python API (e.g. OpenMM).
In order to create a custom propagator, users must define a class that inherits
from WESTPropagator
and implement three methods:
get_pcoord(self, state)
: Get the progress coordinate of the given basis or initial state.gen_istate(self, basis_state, initial_state)
: Generate a new initial state from the given basis state. This method is optional ifgen_istates
is set toFalse
in the propagation section of the configuration file, which is the default setting.propagate(self, segments)
: Propagate one or more segments, including any necessary per-iteration setup and teardown for this propagator.
There are also two stubs that that, if overridden, provide a mechanism for modifying the simulation before or after the iteration:
prepare_iteration(self, n_iter, segments)
: Perform any necessary per-iteration preparation. This is run by the work manager.finalize_iteration(self, n_iter, segments)
: Perform any necessary post-iteration cleanup. This is run by the work manager.
Several examples of custom propagators are available:
Configuration File
The configuration of a WESTPA simulation is specified using a plain text file written in YAML. This file specifies, among many other things, the length of the simulation, which modules should be loaded for specifying the system, how external data should be organized on the file system, and which plugins should used. YAML is a hierarchical format and WESTPA organizes the configuration settings into blocks for each component. While below, the configuration file will be referred to as west.cfg, the user is free to name the configuration file something else. Most of the scripts and tools that WESTPA provides, however, require that the name of the configuration file be specified if the default name is not used.
The top most heading in west.cfg should be specified as::
---
west:
...
with all sub-section specified below it. A complete example can be found for the NaCl example: https://github.com/westpa/westpa/blob/master/lib/examples/nacl_gmx/west.cfg
In the following section, the specifications for each section of the file can be found, along with default parameters and descriptions. Required parameters are indicated as REQUIRED.:
---
west:
...
system:
driver: REQUIRED
module_path: []
The driver
parameter must be set to a subclass of WESTSystem
, and given
in the form module.class. The module_path
parameter is appended to the
system path and indicates where the class is defined.:
---
west:
...
we:
adjust_counts: True
weight_split_threshold: 2.0
weight_merge_cutoff: 1.0
The we
section section specifies parameters related to the Huber and Kim
resampling algorithm. WESTPA implements a variation of the method, in which
setting adust_counts
to True
strictly enforces that the number of
replicas per bin is exactly system.bin_target_counts
. Otherwise, the number
of replicas per is allowed to fluctuate as in the original implementation of
the algorithm. Adjusting the counts can improve load balancing for parallel
simulations. Replicas with weights greater than weight_split_threshold
times the ideal weight per bin are tagged as candidates for splitting. Replicas
with weights less than weight_merge_cutoff
times the ideal weight per bin
are candidates for merging.:
---
west:
...
propagation:
gen_istates: False
block_size: 1
save_transition_matrices: False
max_run_wallclock: None
max_total_iterations: None
gen_istates
: Boolean specifying whether to generate initial states from the basis states. The executable propagator defines a specific configuration block (add internal link to other section), and custom propagators should override theWESTPropagator.gen_istate()
method.block_size
: An integer defining how many segments should be passed to a worker at a time. When using the serial work manager, this value should be set to the maximum number of segments per iteration to avoid significant overhead incurred by the locking mechanism in the WMFutures framework. Parallel work managers might benefit from setting this value greater than one in some instances to decrease network communication load.save_transition_matrices
:max_run_wallclock
: A time in dd:hh:mm:ss or hh:mm:ss specifying the maximum wallclock time of a particular WESTPA run. If running on a batch queuing system, this time should be set to less than the job allocation time to ensure that WESTPA shuts down cleanly.max_total_iterations
: An integer value specifying the number of iterations to run. This parameter is checked against the last completed iteration stored in the HDF5 file, not the number of iterations completed for a specific run. The default value ofNone
only stops upon external termination of the code.:--- west: ... data: west_data_file: REQUIRED aux_compression_threshold: 1048576 iter_prec: 8 datasets: -name: REQUIRED h5path: store: True load: False dtype: scaleoffset: None compression: None chunks: None data_refs: segment: basis_state: initial_state:
west_data_file
: The name of the main HDF5 data storage file for the WESTPA simulation.aux_compression_threshold
: The threshold in bytes for compressing the auxiliary data in a dataset on an iteration-by-iteration basis.iter_prec
: The length of the iteration index with zero-padding. For the default value, iteration 1 would be specified as iter_00000001.datasets
:data_refs
:plugins
executable
Environmental Variables
There are a number of environmental variables that can be set by the user in order to configure a WESTPA simulation:
WEST_ROOT: path to the base directory containing the WESTPA install
WEST_SIM_ROOT: path to the base directory of the WESTPA simulation
WEST_PYTHON: path to python executable to run the WESTPA simulation
WEST_PYTHONPATH: path to any additional modules that WESTPA will require to run the simulation
WEST_KERNPROF: path to
kernprof.py
script to perform line-by-line profiling of a WESTPA simulation (see python line_profiler). This is only required for users who need to profile specific methods in a running WESTPA simulation.
Work manager related environmental variables:
WM_WORK_MANAGER
WM_N_WORKERS
WESTPA makes available to any script executed by it (e.g. runseg.sh), a number of environmental variables that are set dynamically by the executable propagator from the running simulation.
Programs executed for an iteration
The following environment variables are passed to programs executed on a per-iteration basis, notably pre-iteration and post-iteration scripts.
Variable |
Possible values |
Function |
---|---|---|
WEST_CURRENT_ITER |
Integer >=1 |
Current iteration number |
Programs executed for a segment
The following environment variables are passed to programs executed on a per-segment basis, notably dynamics propagation.
Variable |
Possible values |
Function |
---|---|---|
WEST_CURRENT_ITER |
Integer >=1 |
Current iteration number |
WEST_CURRENT_SEG_ID |
Integer >=0 |
Current segment ID |
WEST_CURRENT_SEG_DATA_REF |
String |
General-purpose reference, based on current segment information, configured in west.cfg. Usually used for storage paths |
WEST_CURRENT_SEG_INITPOINT_TYPE |
Enumeration: SEG_INITPOINT_CONTINUES, SEG_INITPOINT_NEWTRAJ |
Whether this segment continues a previous trajectory or initiates a new one. |
WEST_PARENT_ID |
Integer |
Segment ID of parent segment. Negative for initial points. |
WEST_PARENT_DATA_REF |
String |
General purpose reference, based on parent segment information, configured in west.cfg. Usually used for storage paths |
WEST_PCOORD_RETURN |
Filename |
Where progress coordinate data must be stored |
WEST_RAND16 |
Integer |
16-bit random integer |
WEST_RAND32 |
Integer |
32-bit random integer |
WEST_RAND64 |
Integer |
64-bit random integer |
WEST_RAND128 |
Integer |
128-bit random integer |
WEST_RANDFLOAT |
Floating-point |
Random number in [0,1). |
Additionally for any additional datasets specified in the configuration file,
WESTPA automatically provides WEST_X_RETURN
, where X
is the uppercase
name of the dataset. For example if the configuration file contains the
following:
data:
...
datasets: # dataset storage options
- name: energy
WESTPA would make WEST_ENERGY_RETURN
available.
Programs executed for a single point
Programs used for creating initial states from basis states (gen_istate.sh
)
or extracting progress coordinates from structures (e.g. get_pcoord.sh
) are
provided the following environment variables:
Variable |
Available for |
Possible values |
Function |
---|---|---|---|
WEST_STRUCT_DATA_REF |
All single-point calculations |
String |
General-purpose reference, usually a pathname, associated with the basis/initial state. |
WEST_BSTATE_ID |
get_pcoord for basis state, gen_istate |
Integer >= 0 |
Basis state ID |
WEST_BSTATE_DATA_REF |
get_pcoord for basis state, gen_istate |
String |
Basis state data reference |
WEST_ISTATE_ID |
get_pcoord for initial state, gen_istate |
Integer >= 0 |
Inital state ID |
WEST_ISTATE_DATA_REF |
get_pcoord for initial state, gen_istate |
String |
Initial state data references, usually a pathname |
WEST_PCOORD_RETURN |
get_pcoord for basis or initial state |
Pathname |
Where progress coordinate data is expected to be found after execution |
Plugins
WESTPA has a extensible plugin architecture that allows the user to manipulate the simulation at specified points during an iteration.
Activating plugins in the config file
Plugin execution order/priority
Weighted Ensemble Algorithm (Resampling)
Running
Overview
The w_run command is used to run weighted ensemble simulations configured <setup> with w_init.
Setting simulation limits
Running a simulation
Running on a single node
Running on multiple nodes with MPI
Running on multiple nodes with ZeroMQ
Managing data
Recovering from errors
By default, information about simulation progress is stored in west-JOBID.log (where JOBID refers to the job ID given by the submission engine); any errors will be logged here.
The error “could not read pcoord from ‘tempfile’: progress coordinate has incorrect shape” may come about from multiple causes; it is possible that the progress coordinate length is incorrectly specified in system.py (self.pcoord_len), or that GROMACS (or whatever simulation package you are using) had an error during the simulation.
The first case will be obvious by what comes after the message: (XX, YY) (where XX is non-zero), expected (ZZ, GG) (whatever is in system.py). This can be corrected by adjusting system.py.
In the second case, the progress coordinate length is 0; this indicates that no progress coordinate data exists (null string), which implies that the simulation software did not complete successfully. By default, the simulation package (GROMACS or otherwise) terminal output is stored in a log file inside of seg_logs. Any error that occurred during the actual simulation will be logged here, and can be corrected as needed.
Analysis
Gauging simulation progress and convergence
Progress coordinate distribution (w_pcpdist)
w_pcpdist and plothist
Kinetics for source/sink simulations
w_fluxanl
Kinetics for arbitrary state definitions
In order to calculate rate constants, it is necessary to run three different tools:
- :ref:`w_assign`
- :ref:`w_kinetics`
- :ref:`w_kinavg`
The w_assign tool assigns trajectories to states (states which correspond to a target bin) at a sub-tau resolution. This allows w_kinetics to properly trace the trajectories and prepare the data for further analysis.
Although the bin and state definitions can be pulled from the system, it is frequently more convenient to specify custom bin boundaries and states; this eliminates the need to know what constitutes a state prior to starting the simulation. Both files must be in the YAML format, of which there are numerous examples of online. A quick example for each file follows:
States:
---
states:
- label: unbound
coords:
- [25,0]
- label: boun
coords:
- [1.5,33.0]
Bins:
---
bins:
type: RectilinearBinMapper
boundaries: [[0.0,1.57,25.0,10000],[0.0,33.0,10000]]
This system has a two dimensional progress coordinate, and two definite states, as defined by the PMF. The binning used during the simulation was significantly more complex; defining a smaller progress coordinate (in which we have three regions: bound, unbound, and in between) is simply a matter of convenience. Note that these custom bins do not change the simulation in any fashion; you can adjust state definitions and bin boundaries at will without altering the way the simulation runs.
The help definition, included by running:
w_assign --help
usually contains the most up-to-date help information, and so more information about command line options can be obtained from there. To run with the above YAML files, assuming they are named STATES and BINS, you would run the following command:
w_assign --states-from-file STATES --bins-from-file BINS
By default, this produces a .h5 file (named assign.h5); this can be changed via the command line.
The w_kinetics tool uses the information generated from w_assign to trace through trajectories and calculate flux with included color information. There are two main methods to run w_kinetics:
w_kinetics trace
w_kinetics matrix
The matrix method is still in development; at this time, trace is the recommended method.
Once the w_kinetics analysis is complete, you can check for convergence of the rate constants. WESTPA includes two tools to help you do this: w_kinavg and ploterr. First, begin by running the following command (keep in mind that w_kinavg has the same type of analysis as w_kinetics does; whatever method you chose (trace or matrix) in the w_kinetics step should be used here, as well):
w_kinavg trace -e cumulative
This instructs w_kinavg to produce a .h5 file with the cumulative rate information; by then using ploterr, you can determine whether the rates have stopped changing:
ploterr kinavg
By default, this produces a set of .pdf files, containing cumulative rate and flux information for each state-to-state transition as a function of the WESTPA iteration. Determine at which iteration the rate stops changing; then, rerun w_kinavg with the following systems:
w_kinavg trace --first-iter ITER
where ITER is the beginning of the unchanging region. This will then output information much like the following:
fluxes into macrostates:
unbound: mean=1.712580005863456e-02 CI=(1.596595628304422e-02, 1.808249529394858e-02) * tau^-1
bound : mean=5.944989301935855e-04 CI=(4.153556214886056e-04, 7.789568983584020e-04) * tau^-1
fluxes from state to state:
unbound -> bound : mean=5.944989301935855e-04 CI=(4.253003401668849e-04, 7.720997503648696e-04) * tau^-1
bound -> unbound: mean=1.712580005863456e-02 CI=(1.590547796439216e-02, 1.808154616175579e-02) * tau^-1
rates from state to state:
unbound -> bound : mean=9.972502012305491e-03 CI=(7.165030136921814e-03, 1.313767180582492e-02) * tau^-1
bound -> unbound: mean=1.819520888349874e-02 CI=(1.704608273094848e-02, 1.926165865735958e-02) * tau^-1
Divide by tau to calculate your rate constant.
WEST Tools
The command line tools included with the WESTPA software package are broadly separable into two categories: Tools for initializing a simulation and tools for analyzing results.
Command function can be user defined and modified. The particular parameters of different command line tools are specified, in order of precedence, by:
User specified command line arguments
User defined environmental variables
Package defaults
This page focuses on outlining the general functionality of the command line tools and providing an overview of command line arguments that are shared by multiple tools. See the index of command-line tools for a more comprehensive overview of each tool.
Overview
All tools are located in the $WEST_ROOT/bin
directory, where the shell
variable WEST_ROOT
points to the path where the WESTPA package is located
on your machine.
You may wish to set this variable automatically by adding the following to your
~/.bashrc
or ~/.profile
file:
export WEST_ROOT="$HOME/westpa"
where the path to the westpa suite is modified accordingly.
Tools for setting up and running a simulation
Use the following commands to initialize, configure, and run a weighted ensemble simulation. Command line arguments or environmental variables can be set to specify the work managers for running the simulation, where configuration data is read from, and the HDF5 file in which results are stored.
Command |
Function |
---|---|
Initializes simulation configuration files and environment. Always run this command before starting a new simulation. |
|
Set up binning, progress coordinate |
|
Launches a simulation. Command arguments/environmental variables can be included to specify the work managers and simulation parameters |
|
Truncates the weighted ensemble simulation from a given iteration. |
Tools for analyzing simulation results
The following command line tools are provided for analysis after running a weighted ensemble simulation (and collecting the results in an HDF5 file).
With the exception of the plotting tool plothist
, all analysis tools read
from and write to HDF5 type files.
Command |
Function |
---|---|
Assign walkers to bins and macrostates (using simulation output as input). Must be done before some other analysis tools (e.g. w_kinetics, w_kinavg) |
|
Trace the path of a given walker segment over a user-specified number of simulation iterations. |
|
Calculate average probability flux into user-defined ‘target’ state with relevant statistics. |
|
Construct a probability distribution of results (e.g. progress coordinate membership) for subsequent plotting with plothist. |
|
Tool to plot output from other analysis tools (e.g. w_pdist). |
General Command Line Options
The following arguments are shared by all command line tools:
-r config file, --rcfile config file
Use config file as the configuration file (Default: File named west.cfg)
--quiet, --verbose, --debug
Specify command tool output verbosity (Default: 'quiet' mode)
--version
Print WESTPA version number and exit
-h, --help
Output the help information for this command line tool and exit
A note on specifying a configuration file
A configuration file, which should be stored in your simulation root directory, is read by all command line tools. The configuration file specifies parameters for general simulation setup, as well as the hdf5 file name where simulation data is stored and read by analysis tools.
If not specified, the default configuration file is assumed to be named west.cfg.
You can override this to use configuration file file by either:
Setting the environmental variable
WESTRC
equal to file:export WESTRC=/path/to/westrcfile
Including the command line argument
-r /path/to/westrcfile
Work Manager Options
Note: See wwmgr overview for a more detailed explanation of the work manager framework.
Work managers a used by a number of command-line tools to process more complex tasks, especially in setting up and running simulations (i.e. w_init and w_run) - in general, work managers are involved in tasks that require multiprocessing and/or tasks distributed over multiple nodes in a cluster.
Overview
The following command-line tools make use of work managers:
General work manager options
The following are general options used for specifying the type of work manager and number of cores:
--wm-work-manager work_manager
Specify which type of work manager to use, where the possible choices for
work_manager are: {processes, gcserial, threads, mpi, or zmq}. See the
wwmgr overview page <wwmgr>_ for more information on the different types of
work managers (Default: gcprocesses)
--wm-n-workers n_workers
Specify the number of cores to use as gcn_workers, if the work manager you
selected supports this option (work managers that do not will ignore this
option). If using an gcmpi or zmq work manager, specify gc--wm-n-workers=0
for a dedicated server (Default: Number of cores available on machine)
The mpi
work manager is generally sufficient for most tasks that make use
of multiple nodes on a cluster. The zmq
work manager is preferable if the
mpi
work manager does not work properly on your cluster or if you prefer to
have more explicit control over the distribution of communication tasks on your
cluster.
ZeroMQ (‘zmq’) work manager
The ZeroMQ work manager offers a number of additional options (all of which are optional and have default values). All of these options focus on whether the zmq work manager is set up as a server (i.e. task distributor/ventilator) or client (task processor):
--wm-zmq-mode mode
Options: {server or client}. Specify whether the ZMQ work manager on this
node will operate as a server or a client (Default: server)
--wm-zmq-info-file info_file
Specify the name of a temporary file to write (as a server) or read (as a
client) socket connection endpoints (Default: server_x.json, where x is a
unique identifier string)
--wm-zmq-task-endpoint task_endpoint
Explicitly use task_endpoint to bind to (as server) or connect to (as
client) for task distribution (Default: A randomly determined endpoint that
is written or read from the specified info_file)
--wm-zmq-result-endpoint result_endpoint
Explicitly use result_endpoint to bind to (as server) or connect to (as
client) to distribute and collect task results (Default: A randomly
determined endpoint that is written to or read from the specified
info_file)
--wm-zmq-announce-endpoint announce_endpoint
Explicitly use announce_endpoint to bind to (as server) or connect to (as
client) to distribute central announcements (Default: A randomly determined
endpoint that is written to or read from the specified info_file)
--wm-zmq-heartbeat-interval interval
If a server, send an Im alive ping to connected clients every interval
seconds; If a client, expect to hear a server ping every approximately
interval seconds, or else assume the server has crashed and shutdown
(Default: 600 seconds)
--wm-zmq-task-timeout timeout
Kill worker processes/jobs after that take longer than timeout seconds to
complete (Default: no time limit)
--wm-zmq-client-comm-mode mode
Use the communication mode, mode, (options: {ipc for Unix sockets, or tcp
for TCP/IP sockets}) to communicate with worker processes (Default: ipc)
Initializing/Running Simulations
For a more complete overview of all the files necessary for setting up a simulation, see the user guide for setting up a simulation
WEST Work Manager
Introduction
WWMGR is the parallel task distribution framework originally included as part of the WEMD source. It was extracted to permit independent development, and (more importantly) independent testing. A number of different schemes can be selected at run-time for distributing work across multiple cores/nodes, as follows:
Name |
Implementation |
Multi-Core |
Multi-Node |
Appropriate For |
---|---|---|---|---|
serial |
None |
No |
No |
Testing, minimizing overhead when dynamics is inexpensive |
threads |
Python “threading” module |
Yes |
No |
Dynamics propagated by external executables, large amounts of data transferred per segment |
processes |
Python “multiprocessing” module |
Yes |
No |
Dynamics propagated by Python routines, modest amounts of data transferred per segment |
mpi |
mpi4py compiled and linked against system MPI |
Yes |
Yes |
Distributing calculations across multiple nodes. Start with this on your cluster of choice. |
zmq |
Yes |
Yes |
Distributing calculations across multiple nodes. Use this if MPI does not work properly on your cluster (particularly for spawning child processes). |
Environment variables
For controlling task distribution
While the original WEMD work managers were controlled by command-line options and entries in wemd.cfg, the new work manager is controlled using command-line options or environment variables (much like OpenMP). These variables are as follow:
Variable |
Applicable to |
Default |
Meaning |
---|---|---|---|
WM_WORK_MANAGER |
(none) |
processes |
Use the given task distribution system: “serial”, “threads”, “processes”, or “zmq” |
WM_N_WORKERS |
threads, processes, zmq |
number of cores in machine |
Use this number of workers. In the case of zmq, use this many workers on the current machine only (can be set independently on different nodes). |
WM_ZMQ_MODE |
zmq |
server |
Start as a server (“server”) or a client (“client”). Servers coordinate a given calculation, and clients execute tasks related to that calculation. |
WM_ZMQ_TASK_TIMEOUT |
zmq |
60 |
Time (in seconds) after which a worker will be considered hung, terminated, and restarted. This must be updated for long-running dynamics segments. Set to zero to disable hang checks entirely. |
WM_ZMQ_TASK_ENDPOINT |
zmq |
Random port |
Master distributes tasks at this address |
WM_ZMQ_RESULT_ENDPOINT |
zmq |
Random port |
Master receives task results at this address | |
WM_ZMQ_ANNOUNCE_ENDPOINT |
zmq |
Random port |
Master publishes announcements (such as “shut down now”) at this address |
WM_ZMQ_SERVER_INFO |
zmq |
|
A file describing the above endpoints can be found here (to ease cluster-wide startup) |
For passing information to workers
One environment variable is made available by multi-process work managers (processes and ZMQ) to help clients configure themselves (e.g. select an appropriate GPU on a multi-GPU node):
Variable |
Applicable to |
Meaning |
---|---|---|
WM_PROCESS_INDEX |
processes, zmq |
Contains an integer, 0 based, identifying the process among the set of processes started on a given node. |
The ZeroMQ work manager for clusters
The ZeroMQ (“zmq”) work manager can be used for both single-machine and cluster-wide communication. Communication occurs over sockets using the ZeroMQ messaging protocol. Within nodes, Unix sockets are used for efficient communication, while between nodes, TCP sockets are used. This also minimizes the number of open sockets on the master node.
The quick and dirty guide to using this on a cluster is as follows:
source env.sh
export WM_WORK_MANAGER=zmq
export WM_ZMQ_COMM_MODE=tcp
export WM_ZMQ_SERVER_INFO=$WEST_SIM_ROOT/wemd_server_info.json
w_run &
# manually run w_run on each client node, as appropriate for your batch system
# e.g. qrsh -inherit for Grid Engine, or maybe just simple SSH
for host in $(cat $TMPDIR/machines | sort | uniq); do
qrsh -inherit -V $host $PWD/node-ltc1.sh &
done
WEST Extensions
Post-Analysis Reweighting
String Method
Weighted Ensemble Equilibrium Dynamics
Weighted Ensemble Steady State
Command Line Tool Index
w_init
usage:
w_init [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [--force]
[--bstate-file BSTATE_FILE] [--bstate BSTATES] [--tstate-file TSTATE_FILE]
[--tstate TSTATES] [--segs-per-state N] [--no-we]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Initialize a new WEST simulation, creating the WEST HDF5 file and preparing the first iteration’s segments. Initial states are generated from one or more “basis states” which are specified either in a file specified with –bstates-from, or by one or more “–bstate” arguments. If neither –bstates-from nor at least one “–bstate” argument is provided, then a default basis state of probability one identified by the state ID zero and label “basis” will be created (a warning will be printed in this case, to remind you of this behavior, in case it is not what you wanted). Target states for (non- equilibrium) steady-state simulations are specified either in a file specified with –tstates-from, or by one or more –tstate arguments. If neither –tstates-from nor at least one –tstate argument is provided, then an equilibrium simulation (without any sinks) will be performed.
optional arguments:
-h, --help show this help message and exit
--force Overwrite any existing simulation data
--bstate-file BSTATE_FILE, --bstates-from BSTATE_FILE
Read basis state names, probabilities, and (optionally) data references from
BSTATE_FILE.
--bstate BSTATES Add the given basis state (specified as a string 'label,probability[,auxref]')
to the list of basis states (after those specified in --bstates-from, if any).
This argument may be specified more than once, in which case the given states
are appended in the order they are given on the command line.
--tstate-file TSTATE_FILE, --tstates-from TSTATE_FILE
Read target state names and representative progress coordinates from
TSTATE_FILE
--tstate TSTATES Add the given target state (specified as a string
'label,pcoord0[,pcoord1[,...]]') to the list of target states (after those
specified in the file given by --tstates-from, if any). This argument may be
specified more than once, in which case the given states are appended in the
order they appear on the command line.
--segs-per-state N Initialize N segments from each basis state (default: 1).
--no-we, --shotgun Do not run the weighted ensemble bin/split/merge algorithm on newly-created
segments.
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work
managers are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option.
Use 0 for a dedicated server. (Ignored by work managers which do not support
this option.)
- options for ZeroMQ (“zmq”) work manager (master or node):
- --zmq-mode MODE
Operate as a master (server) or a node (workers/client). “server” is a deprecated synonym for “master” and “client” is a deprecated synonym for “node”.
- --zmq-comm-mode COMM_MODE
Use the given communication mode – TCP or IPC (Unix-domain) – sockets for communication within a node. IPC (the default) may be more efficient but is not available on (exceptionally rare) systems without node-local storage (e.g. /tmp); on such systems, TCP may be used instead.
- --zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in INFO_FILE. This allows the master and nodes assisting in coordinating the communication of other nodes to choose ports randomly. Downstream nodes read this file with –zmq-read-host-info and know where how to connect.
- --zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other coordinating node) from INFO_FILE. This allows the master and nodes assisting in coordinating the communication of other nodes to choose ports randomly, writing that information with –zmq-write-host-info for this instance to read.
- --zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward the master.
- --zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown notification) traffic from the master.
- --zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic from subsidiary workers.
- --zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown notification) traffic toward workers.
- --zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
- --zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
- --zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn’t hear from a worker in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker doesn’t hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is assumed to have crashed. Both cases result in shutdown.
- --zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at least one worker. This may need to be changed on very large, heavily-loaded computer systems that start all processes simultaneously.
- --zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_bins
w_bins
deals with binning modification and statistics
Overview
Usage:
$WEST_ROOT/bin/w_bins [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE]
{info,rebin} ...
Display information and statistics about binning in a WEST simulation, or modify the binning for the current iteration of a WEST simulation.
Command-Line Options
See the general command-line tool reference for more information on the general options.
Options Under ‘info’
Usage:
$WEST_ROOT/bin/w_bins info [-h] [-n N_ITER] [--detail]
[--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION | --bins-from-file]
Positional options:
info
Display information about binning.
Options for ‘info’:
-n N_ITER, --n-iter N_ITER
Consider initial points of segment N_ITER (default: current
iteration).
--detail
Display detailed per-bin information in addition to summary
information.
Binning options for ‘info’:
--bins-from-system
Bins are constructed by the system driver specified in the WEST
configuration file (default where stored bin definitions not
available).
--bins-from-expr BINS_FROM_EXPR, --binbounds BINS_FROM_EXPR
Construct bins on a rectilinear grid according to the given BINEXPR.
This must be a list of lists of bin boundaries (one list of bin
boundaries for each dimension of the progress coordinate), formatted
as a Python expression. E.g. "[[0,1,2,4,inf],[-inf,0,inf]]". The
numpy module and the special symbol "inf" (for floating-point
infinity) are available for use within BINEXPR.
--bins-from-function BINS_FROM_FUNCTION, --binfunc BINS_FROM_FUNCTION
Supply an external function which, when called, returns a properly
constructed bin mapper which will then be used for bin assignments.
This should be formatted as "[PATH:]MODULE.FUNC", where the function
FUNC in module MODULE will be used; the optional PATH will be
prepended to the module search path when loading MODULE.
--bins-from-file
Load bin specification from the data file being examined (default
where stored bin definitions available).
Options Under ‘rebin’
Usage:
$WEST_ROOT/bin/w_bins rebin [-h] [--confirm] [--detail]
[--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION]
[--target-counts TARGET_COUNTS | --target-counts-from FILENAME]
Positional option:
rebin
Rebuild current iteration with new binning.
Options for ‘rebin’:
--confirm
Commit the revised iteration to HDF5; without this option, the
effects of the new binning are only calculated and printed.
--detail
Display detailed per-bin information in addition to summary
information.
Binning options for ‘rebin’;
Same as the binning options for ‘info’.
Bin target count options for ‘rebin’;:
--target-counts TARGET_COUNTS
Use TARGET_COUNTS instead of stored or system driver target counts.
TARGET_COUNTS is a comma-separated list of integers. As a special
case, a single integer is acceptable, in which case the same target
count is used for all bins.
--target-counts-from FILENAME
Read target counts from the text file FILENAME instead of using
stored or system driver target counts. FILENAME must contain a list
of integers, separated by arbitrary whitespace (including newlines).
Input Options
-W WEST_H5FILE, --west_data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file
specified in west.cfg).
Examples
(TODO: Write up an example)
w_run
usage:
w_run [-h]
Start/continue a WEST simulation
optional arguments:
-h, --help show this help message and exit
--oneseg only propagate one segment (useful for debugging propagators)
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work
managers are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option.
Use 0 for a dedicated server. (Ignored by work managers which do not support
this option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a
deprecated synonym for "master" and "client" is a deprecated synonym for
"node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g.
/tmp); on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read
this file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting
in coordinating the communication of other nodes to choose ports randomly,
writing that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic
toward the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result)
traffic from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker
in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_truncate
NOTE: w_truncate only deletes iteration groups from the HDF5 data store. It is recommended that any iteration data saved to the file system (e.g. in the traj_segs directory) is deleted or moved for the corresponding iterations.
usage:
w_truncate [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-n N_ITER]
Remove all iterations after a certain point in a WESTPA simulation.
optional arguments:
-h, --help show this help message and exit
-n N_ITER, --iter N_ITER
Truncate this iteration and those following.
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
w_fork
usage:
w_fork [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-i INPUT_H5FILE]
[-I N_ITER] [-o OUTPUT_H5FILE] [--istate-map ISTATE_MAP] [--no-headers]
Prepare a new weighted ensemble simulation from an existing one at a particular point. A new HDF5 file is generated. In the case of executable propagation, it is the user’s responsibility to prepare the new simulation directory appropriately, particularly making the old simulation’s restart data from the appropriate iteration available as the new simulations initial state data; a mapping of old simulation segment to new simulation initial states is created, both in the new HDF5 file and as a flat text file, to aid in this. Target states and basis states for the new simulation are taken from those in the original simulation.
optional arguments:
-h, --help show this help message and exit
-i INPUT_H5FILE, --input INPUT_H5FILE
Create simulation from the given INPUT_H5FILE (default: read from configuration
file.
-I N_ITER, --iteration N_ITER
Take initial distribution for new simulation from iteration N_ITER (default:
last complete iteration).
-o OUTPUT_H5FILE, --output OUTPUT_H5FILE
Save new simulation HDF5 file as OUTPUT (default: forked.h5).
--istate-map ISTATE_MAP
Write text file describing mapping of existing segments to new initial states
in ISTATE_MAP (default: istate_map.txt).
--no-headers Do not write header to ISTATE_MAP
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
w_assign
usage:
w_assign [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-W WEST_H5FILE]
[--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION | --bins-from-file BINFILE | --bins-from-h5file]
[--construct-dataset CONSTRUCT_DATASET | --dsspecs DSSPEC [DSSPEC ...]]
[--states STATEDEF [STATEDEF ...] | --states-from-file STATEFILE |
--states-from-function STATEFUNC] [-o OUTPUT] [--subsample] [--config-from-file]
[--scheme-name SCHEME] [--serial | --parallel | --work-manager WORK_MANAGER]
[--n-workers N_WORKERS] [--zmq-mode MODE] [--zmq-comm-mode COMM_MODE]
[--zmq-write-host-info INFO_FILE] [--zmq-read-host-info INFO_FILE]
[--zmq-upstream-rr-endpoint ENDPOINT] [--zmq-upstream-ann-endpoint ENDPOINT]
[--zmq-downstream-rr-endpoint ENDPOINT] [--zmq-downstream-ann-endpoint ENDPOINT]
[--zmq-master-heartbeat MASTER_HEARTBEAT] [--zmq-worker-heartbeat WORKER_HEARTBEAT]
[--zmq-timeout-factor FACTOR] [--zmq-startup-timeout STARTUP_TIMEOUT]
[--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Assign walkers to bins, producing a file (by default named “assign.h5”) which can be used in subsequent analysis.
For consistency in subsequent analysis operations, the entire dataset must be assigned, even if only a subset of the data will be used. This ensures that analyses that rely on tracing trajectories always know the originating bin of each trajectory.
Source data
Source data is provided either by a user-specified function (–construct-dataset) or a list of “data set specifications” (–dsspecs). If neither is provided, the progress coordinate dataset ‘’pcoord’’ is used.
To use a custom function to extract or calculate data whose probability distribution will be calculated, specify the function in standard Python MODULE.FUNCTION syntax as the argument to –construct-dataset. This function will be called as function(n_iter,iter_group), where n_iter is the iteration whose data are being considered and iter_group is the corresponding group in the main WEST HDF5 file (west.h5). The function must return data which can be indexed as [segment][timepoint][dimension].
To use a list of data set specifications, specify –dsspecs and then list the
desired datasets one-by-one (space-separated in most shells). These data set
specifications are formatted as NAME[,file=FILENAME,slice=SLICE], which will
use the dataset called NAME in the HDF5 file FILENAME (defaulting to the main
WEST HDF5 file west.h5), and slice it with the Python slice expression SLICE
(as in [0:2] to select the first two elements of the first axis of the
dataset). The slice
option is most useful for selecting one column (or
more) from a multi-column dataset, such as arises when using a progress
coordinate of multiple dimensions.
Specifying macrostates
Optionally, kinetic macrostates may be defined in terms of sets of bins. Each trajectory will be labeled with the kinetic macrostate it was most recently in at each timepoint, for use in subsequent kinetic analysis. This is required for all kinetics analysis (w_kintrace and w_kinmat).
There are three ways to specify macrostates:
States corresponding to single bins may be identified on the command line using the –states option, which takes multiple arguments, one for each state (separated by spaces in most shells). Each state is specified as a coordinate tuple, with an optional label prepended, as in
bound:1.0
orunbound:(2.5,2.5)
. Unlabeled states are namedstateN
, where N is the (zero-based) position in the list of states supplied to –states.States corresponding to multiple bins may use a YAML input file specified with –states-from-file. This file defines a list of states, each with a name and a list of coordinate tuples; bins containing these coordinates will be mapped to the containing state. For instance, the following file:
--- states: - label: unbound coords: - [9.0, 1.0] - [9.0, 2.0] - label: bound coords: - [0.1, 0.0]produces two macrostates: the first state is called “unbound” and consists of bins containing the (2-dimensional) progress coordinate values (9.0, 1.0) and (9.0, 2.0); the second state is called “bound” and consists of the single bin containing the point (0.1, 0.0).
Arbitrary state definitions may be supplied by a user-defined function, specified as –states-from-function=MODULE.FUNCTION. This function is called with the bin mapper as an argument (
function(mapper)
) and must return a list of dictionaries, one per state. Each dictionary must contain a vector of coordinate tuples with key “coords”; the bins into which each of these tuples falls define the state. An optional name for the state (with key “label”) may also be provided.
Output format
The output file (-o/–output, by default “assign.h5”) contains the following attributes datasets:
``nbins`` attribute
*(Integer)* Number of valid bins. Bin assignments range from 0 to
*nbins*-1, inclusive.
``nstates`` attribute
*(Integer)* Number of valid macrostates (may be zero if no such states are
specified). Trajectory ensemble assignments range from 0 to *nstates*-1,
inclusive, when states are defined.
``/assignments`` [iteration][segment][timepoint]
*(Integer)* Per-segment and -timepoint assignments (bin indices).
``/npts`` [iteration]
*(Integer)* Number of timepoints in each iteration.
``/nsegs`` [iteration]
*(Integer)* Number of segments in each iteration.
``/labeled_populations`` [iterations][state][bin]
*(Floating-point)* Per-iteration and -timepoint bin populations, labeled
by most recently visited macrostate. The last state entry (*nstates-1*)
corresponds to trajectories initiated outside of a defined macrostate.
``/bin_labels`` [bin]
*(String)* Text labels of bins.
When macrostate assignments are given, the following additional datasets are present:
``/trajlabels`` [iteration][segment][timepoint]
*(Integer)* Per-segment and -timepoint trajectory labels, indicating the
macrostate which each trajectory last visited.
``/state_labels`` [state]
*(String)* Labels of states.
``/state_map`` [bin]
*(Integer)* Mapping of bin index to the macrostate containing that bin.
An entry will contain *nbins+1* if that bin does not fall into a
macrostate.
Datasets indexed by state and bin contain one more entry than the number of valid states or bins. For N bins, axes indexed by bin are of size N+1, and entry N (0-based indexing) corresponds to a walker outside of the defined bin space (which will cause most mappers to raise an error). More importantly, for M states (including the case M=0 where no states are specified), axes indexed by state are of size M+1 and entry M refers to trajectories initiated in a region not corresponding to a defined macrostate.
Thus, labeled_populations[:,:,:].sum(axis=1)[:,:-1]
gives overall per-bin
populations, for all defined bins and
labeled_populations[:,:,:].sum(axis=2)[:,:-1]
gives overall
per-trajectory-ensemble populations for all defined states.
Parallelization
This tool supports parallelized binning, including reading/calculating input data.
Command-line options
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks
that have very large requests/response. Default: no limit.
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
- binning options:
- --bins-from-system
Bins are constructed by the system driver specified in the WEST configuration file (default where stored bin definitions not available).
- --bins-from-expr BINS_FROM_EXPR, --binbounds BINS_FROM_EXPR
Construct bins on a rectilinear grid according to the given BINEXPR. This must be a list of lists of bin boundaries (one list of bin boundaries for each dimension of the progress coordinate), formatted as a Python expression. E.g. “[[0,1,2,4,inf],[-inf,0,inf]]”. The numpy module and the special symbol “inf” (for floating-point infinity) are available for use within BINEXPR.
- --bins-from-function BINS_FROM_FUNCTION, --binfunc BINS_FROM_FUNCTION
Supply an external function which, when called, returns a properly constructed bin mapper which will then be used for bin assignments. This should be formatted as “[PATH:]MODULE.FUNC”, where the function FUNC in module MODULE will be used; the optional PATH will be prepended to the module search path when loading MODULE.
- --bins-from-file BINFILE, --binfile BINFILE
Load bin specification from the YAML file BINFILE. This currently takes the form {‘bins’: {‘type’: ‘RectilinearBinMapper’, ‘boundaries’: [[boundset1], [boundset2], … ]}}; only rectilinear bin bounds are supported.
- --bins-from-h5file
Load bin specification from the data file being examined (default where stored bin definitions available).
input dataset options:
--construct-dataset CONSTRUCT_DATASET
Use the given function (as in module.function) to extract source data. This
function will be called once per iteration as function(n_iter, iter_group) to
construct data for one iteration. Data returned must be indexable as
[seg_id][timepoint][dimension]
--dsspecs DSSPEC [DSSPEC ...]
Construct source data from one or more DSSPECs.
macrostate definitions:
--states STATEDEF [STATEDEF ...]
Single-bin kinetic macrostate, specified by a coordinate tuple (e.g. '1.0' or
'[1.0,1.0]'), optionally labeled (e.g. 'bound:[1.0,1.0]'). States corresponding
to multiple bins must be specified with --states-from-file.
--states-from-file STATEFILE
Load kinetic macrostates from the YAML file STATEFILE. See description above
for the appropriate structure.
--states-from-function STATEFUNC
Load kinetic macrostates from the function STATEFUNC, specified as
module_name.func_name. This function is called with the bin mapper as an
argument, and must return a list of dictionaries {'label': state_label,
'coords': 2d_array_like} one for each macrostate; the 'coords' entry must
contain enough rows to identify all bins in the macrostate.
other options:
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: assign.h5).
--subsample Determines whether or not the data should be subsampled. This is rather useful
for analysing steady state simulations.
--config-from-file Load bins/macrostates from a scheme specified in west.cfg.
--scheme-name SCHEME Name of scheme specified in west.cfg.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work
managers are ('serial', 'threads', 'processes', 'zmq'); default is 'processes'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option.
Use 0 for a dedicated server. (Ignored by work managers which do not support
this option.)
- options for ZeroMQ (“zmq”) work manager (master or node):
- --zmq-mode MODE
Operate as a master (server) or a node (workers/client). “server” is a deprecated synonym for “master” and “client” is a deprecated synonym for “node”.
- --zmq-comm-mode COMM_MODE
Use the given communication mode – TCP or IPC (Unix-domain) – sockets for communication within a node. IPC (the default) may be more efficient but is not available on (exceptionally rare) systems without node-local storage (e.g. /tmp); on such systems, TCP may be used instead.
- --zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in INFO_FILE. This allows the master and nodes assisting in coordinating the communication of other nodes to choose ports randomly. Downstream nodes read this file with –zmq-read-host-info and know where how to connect.
- --zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other coordinating node) from INFO_FILE. This allows the master and nodes assisting in coordinating the communication of other nodes to choose ports randomly, writing that information with –zmq-write-host-info for this instance to read.
- --zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward the master.
- --zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown notification) traffic from the master.
- --zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic from subsidiary workers.
- --zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown notification) traffic toward workers.
- --zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
- --zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
- --zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn’t hear from a worker in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker doesn’t hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is assumed to have crashed. Both cases result in shutdown.
- --zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at least one worker. This may need to be changed on very large, heavily-loaded computer systems that start all processes simultaneously.
- --zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_trace
usage:
w_trace [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-W WEST_H5FILE]
[-d DSNAME] [--output-pattern OUTPUT_PATTERN] [-o OUTPUT]
N_ITER:SEG_ID [N_ITER:SEG_ID ...]
Trace individual WEST trajectories and emit (or calculate) quantities along the trajectory.
Trajectories are specified as N_ITER:SEG_ID pairs. Each segment is traced back to its initial point, and then various quantities (notably n_iter and seg_id) are printed in order from initial point up until the given segment in the given iteration.
Output is stored in several files, all named according to the pattern given by the -o/–output-pattern parameter. The default output pattern is “traj_%d_%d”, where the printf-style format codes are replaced by the iteration number and segment ID of the terminal segment of the trajectory being traced.
Individual datasets can be selected for writing using the -d/--dataset
option
(which may be specified more than once). The simplest form is -d dsname
,
which causes data from dataset dsname
along the trace to be stored to
HDF5. The dataset is assumed to be stored on a per-iteration basis, with
the first dimension corresponding to seg_id and the second dimension
corresponding to time within the segment. Further options are specified
as comma-separated key=value pairs after the data set name, as in:
-d dsname,alias=newname,index=idsname,file=otherfile.h5,slice=[100,...]
The following options for datasets are supported:
alias=newname
When writing this data to HDF5 or text files, use ``newname``
instead of ``dsname`` to identify the dataset. This is mostly of
use in conjunction with the ``slice`` option in order, e.g., to
retrieve two different slices of a dataset and store then with
different names for future use.
index=idsname
The dataset is not stored on a per-iteration basis for all
segments, but instead is stored as a single dataset whose
first dimension indexes n_iter/seg_id pairs. The index to
these n_iter/seg_id pairs is ``idsname``.
file=otherfile.h5
Instead of reading data from the main WEST HDF5 file (usually
``west.h5``), read data from ``otherfile.h5``.
slice=[100,...]
Retrieve only the given slice from the dataset. This can be
used to pick a subset of interest to minimize I/O.
positional arguments
N_ITER:SEG_ID Trace trajectory ending (or at least alive at) N_ITER:SEG_ID.
optional arguments
-h, --help show this help message and exit
-d DSNAME, --dataset DSNAME
Include the dataset named DSNAME in trace output. An extended form like
DSNAME[,alias=ALIAS][,index=INDEX][,file=FILE][,slice=SLICE] will obtain the
dataset from the given FILE instead of the main WEST HDF5 file, slice it by
SLICE, call it ALIAS in output, and/or access per-segment data by a
n_iter,seg_id INDEX instead of a seg_id indexed dataset in the group for
n_iter.
general options
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
WEST input data options
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
output options
--output-pattern OUTPUT_PATTERN
Write per-trajectory data to output files/HDF5 groups whose names begin with
OUTPUT_PATTERN, which must contain two printf-style format flags which will be
replaced with the iteration number and segment ID of the terminal segment of
the trajectory being traced. (Default: traj_%d_%d.)
-o OUTPUT, --output OUTPUT
Store intermediate data and analysis results to OUTPUT (default: trajs.h5).
w_fluxanl
w_fluxanl
calculates the probability flux of a weighted ensemble simulation
based on a pre-defined target state. Also calculates confidence interval of
average flux. Monte Carlo bootstrapping techniques are used to account for
autocorrelation between fluxes and/or errors that are not normally distributed.
Overview
usage:
$WEST_ROOT/bin/w_fluxanl [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE] [-o OUTPUT]
[--first-iter N_ITER] [--last-iter N_ITER]
[-a ALPHA] [--autocorrel-alpha ACALPHA] [-N NSETS] [--evol] [--evol-step ESTEP]
Note: All command line arguments are optional for w_fluxanl
.
Command-Line Options
See the general command-line tool reference for more information on the general options.
Input/output options
These arguments allow the user to specify where to read input simulation result data and where to output calculated progress coordinate probability distribution data.
Both input and output files are hdf5 format.:
-W, --west-data file
Read simulation result data from file *file*. (**Default:** The
*hdf5* file specified in the configuration file)
-o, --output file
Store this tool's output in *file*. (**Default:** The *hdf5* file
**pcpdist.h5**)
Iteration range options
Specify the range of iterations over which to construct the progress coordinate probability distribution.:
--first-iter n_iter
Construct probability distribution starting with iteration *n_iter*
(**Default:** 1)
--last-iter n_iter
Construct probability distribution's time evolution up to (and
including) iteration *n_iter* (**Default:** Last completed
iteration)
Confidence interval and bootstrapping options
Specify alpha values of constructed confidence intervals.:
-a alpha
Calculate a (1 - *alpha*) confidence interval for the mean flux
(**Default:** 0.05)
--autocorrel-alpha ACalpha
Identify autocorrelation of fluxes at *ACalpha* significance level.
Note: Specifying an *ACalpha* level that is too small may result in
failure to find autocorrelation in noisy flux signals (**Default:**
Same level as *alpha*)
-N n_sets, --nsets n_sets
Use *n_sets* samples for bootstrapping (**Default:** Chosen based
on *alpha*)
--evol
Calculate the time evolution of flux confidence intervals
(**Warning:** computationally expensive calculation)
--evol-step estep
(if ``'--evol'`` specified) Calculate the time evolution of flux
confidence intervals for every *estep* iterations (**Default:** 1)
Examples
Calculate the time evolution flux every 5 iterations:
$WEST_ROOT/bin/w_fluxanl --evol --evol-step 5
Calculate mean flux confidence intervals at 0.01 signicance level and calculate autocorrelations at 0.05 significance:
$WEST_ROOT/bin/w_fluxanl --alpha 0.01 --autocorrel-alpha 0.05
Calculate the mean flux confidence intervals using a custom bootstrap sample size of 500:
$WEST_ROOT/bin/w_fluxanl --n-sets 500
w_ipa
usage:
w_ipa [-h] [-r RCFILE] [--quiet] [--verbose] [--version] [--max-queue-length MAX_QUEUE_LENGTH]
[-W WEST_H5FILE] [--analysis-only] [--reanalyze] [--ignore-hash] [--debug] [--terminal]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
optional arguments:
-h, --help show this help message and exit
- general options:
- -r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
- --quiet
emit only essential information
- --verbose
emit extra information
- --version
show program’s version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that
have very large requests/response. Default: no limit.
- WEST input data options:
- -W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in west.cfg).
runtime options:
--analysis-only, -ao Use this flag to run the analysis and return to the terminal.
--reanalyze, -ra Use this flag to delete the existing files and reanalyze.
--ignore-hash, -ih Ignore hash and don't regenerate files.
--debug, -d Debug output largely intended for development.
--terminal, -t Plot output in terminal.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'processes'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_pdist
usage:
w_pdist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-W WEST_H5FILE] [--first-iter N_ITER]
[--last-iter N_ITER] [-b BINEXPR] [-o OUTPUT] [-C] [--loose]
[--construct-dataset CONSTRUCT_DATASET | --dsspecs DSSPEC [DSSPEC ...]]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Calculate time-resolved, multi-dimensional probability distributions of WE datasets.
Source data
Source data is provided either by a user-specified function (–construct-dataset) or a list of “data set specifications” (–dsspecs). If neither is provided, the progress coordinate dataset ‘’pcoord’’ is used.
To use a custom function to extract or calculate data whose probability distribution will be calculated, specify the function in standard Python MODULE.FUNCTION syntax as the argument to –construct-dataset. This function will be called as function(n_iter,iter_group), where n_iter is the iteration whose data are being considered and iter_group is the corresponding group in the main WEST HDF5 file (west.h5). The function must return data which can be indexed as [segment][timepoint][dimension].
To use a list of data set specifications, specify –dsspecs and then list the
desired datasets one-by-one (space-separated in most shells). These data set
specifications are formatted as NAME[,file=FILENAME,slice=SLICE], which will
use the dataset called NAME in the HDF5 file FILENAME (defaulting to the main
WEST HDF5 file west.h5), and slice it with the Python slice expression SLICE
(as in [0:2] to select the first two elements of the first axis of the
dataset). The slice
option is most useful for selecting one column (or
more) from a multi-column dataset, such as arises when using a progress
coordinate of multiple dimensions.
Histogram binning
By default, histograms are constructed with 100 bins in each dimension. This can be overridden by specifying -b/–bins, which accepts a number of different kinds of arguments:
a single integer N
N uniformly spaced bins will be used in each dimension.
a sequence of integers N1,N2,... (comma-separated)
N1 uniformly spaced bins will be used for the first dimension, N2 for the
second, and so on.
a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]
The bin boundaries B11, B12, B13, ... will be used for the first dimension,
B21, B22, B23, ... for the second dimension, and so on. These bin
boundaries need not be uniformly spaced. These expressions will be
evaluated with Python's ``eval`` construct, with ``np`` available for
use [e.g. to specify bins using np.arange()].
The first two forms (integer, list of integers) will trigger a scan of all data in each dimension in order to determine the minimum and maximum values, which may be very expensive for large datasets. This can be avoided by explicitly providing bin boundaries using the list-of-lists form.
Note that these bins are NOT at all related to the bins used to drive WE sampling.
Output format
The output file produced (specified by -o/–output, defaulting to “pdist.h5”) may be fed to plothist to generate plots (or appropriately processed text or HDF5 files) from this data. In short, the following datasets are created:
``histograms``
Normalized histograms. The first axis corresponds to iteration, and
remaining axes correspond to dimensions of the input dataset.
``/binbounds_0``
Vector of bin boundaries for the first (index 0) dimension. Additional
datasets similarly named (/binbounds_1, /binbounds_2, ...) are created
for additional dimensions.
``/midpoints_0``
Vector of bin midpoints for the first (index 0) dimension. Additional
datasets similarly named are created for additional dimensions.
``n_iter``
Vector of iteration numbers corresponding to the stored histograms (i.e.
the first axis of the ``histograms`` dataset).
Subsequent processing
The output generated by this program (-o/–output, default “pdist.h5”) may be
plotted by the plothist
program. See plothist --help
for more
information.
Parallelization
This tool supports parallelized binning, including reading of input data. Parallel processing is the default. For simple cases (reading pre-computed input data, modest numbers of segments), serial processing (–serial) may be more efficient.
Command-line options
optional arguments:
-h, --help show this help message and exit
-b BINEXPR, --bins BINEXPR
Use BINEXPR for bins. This may be an integer, which will be used for each
dimension of the progress coordinate; a list of integers (formatted as
[n1,n2,...]) which will use n1 bins for the first dimension, n2 for the second
dimension, and so on; or a list of lists of boundaries (formatted as [[a1, a2,
...], [b1, b2, ...], ... ]), which will use [a1, a2, ...] as bin boundaries for
the first dimension, [b1, b2, ...] as bin boundaries for the second dimension,
and so on. (Default: 100 bins in each dimension.)
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: pdist.h5).
-C, --compress Compress histograms. May make storage of higher-dimensional histograms more
tractable, at the (possible extreme) expense of increased analysis time.
(Default: no compression.)
--loose Ignore values that do not fall within bins. (Risky, as this can make buggy bin
boundaries appear as reasonable data. Only use if you are sure of your bin
boundary specification.)
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks
that have very large requests/response. Default: no limit.
- WEST input data options:
- -W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
input dataset options:
--construct-dataset CONSTRUCT_DATASET
Use the given function (as in module.function) to extract source data. This
function will be called once per iteration as function(n_iter, iter_group) to
construct data for one iteration. Data returned must be indexable as
[seg_id][timepoint][dimension]
--dsspecs DSSPEC [DSSPEC ...]
Construct probability distribution from one or more DSSPECs.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work
managers are ('serial', 'threads', 'processes', 'zmq'); default is 'processes'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option.
Use 0 for a dedicated server. (Ignored by work managers which do not support
this option.)
- options for ZeroMQ (“zmq”) work manager (master or node):
- --zmq-mode MODE
Operate as a master (server) or a node (workers/client). “server” is a deprecated synonym for “master” and “client” is a deprecated synonym for “node”.
- --zmq-comm-mode COMM_MODE
Use the given communication mode – TCP or IPC (Unix-domain) – sockets for communication within a node. IPC (the default) may be more efficient but is not available on (exceptionally rare) systems without node-local storage (e.g. /tmp); on such systems, TCP may be used instead.
- --zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in INFO_FILE. This allows the master and nodes assisting in coordinating the communication of other nodes to choose ports randomly. Downstream nodes read this file with –zmq-read-host-info and know where how to connect.
- --zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other coordinating node) from INFO_FILE. This allows the master and nodes assisting in coordinating the communication of other nodes to choose ports randomly, writing that information with –zmq-write-host-info for this instance to read.
- --zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward the master.
- --zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown notification) traffic from the master.
- --zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic from subsidiary workers.
- --zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown notification) traffic toward workers.
- --zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
- --zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
- --zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn’t hear from a worker in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker doesn’t hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is assumed to have crashed. Both cases result in shutdown.
- --zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at least one worker. This may need to be changed on very large, heavily-loaded computer systems that start all processes simultaneously.
- --zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_succ
usage:
w_succ [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-A H5FILE] [-W WEST_H5FILE]
[-o OUTPUT_FILE]
List segments which successfully reach a target state.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT_FILE, --output OUTPUT_FILE
Store output in OUTPUT_FILE (default: write to standard output).
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
general analysis options:
-A H5FILE, --analysis-file H5FILE
Store intermediate and final results in H5FILE (default: analysis.h5).
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
w_crawl
usage:
w_crawl [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-W WEST_H5FILE] [--first-iter N_ITER]
[--last-iter N_ITER] [-c CRAWLER_INSTANCE]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
task_callable
Crawl a weighted ensemble dataset, executing a function for each iteration. This can be used for postprocessing of trajectories, cleanup of datasets, or anything else that can be expressed as “do X for iteration N, then do something with the result”. Tasks are parallelized by iteration, and no guarantees are made about evaluation order.
Command-line options
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks
that have very large requests/response. Default: no limit.
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
task options:
-c CRAWLER_INSTANCE, --crawler-instance CRAWLER_INSTANCE
Use CRAWLER_INSTANCE (specified as module.instance) as an instance of
WESTPACrawler to coordinate the calculation. Required only if initialization,
finalization, or task result processing is required.
task_callable Run TASK_CALLABLE (specified as module.function) on each iteration. Required.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work
managers are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option.
Use 0 for a dedicated server. (Ignored by work managers which do not support
this option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a
deprecated synonym for "master" and "client" is a deprecated synonym for
"node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g.
/tmp); on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read
this file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting
in coordinating the communication of other nodes to choose ports randomly,
writing that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic
toward the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result)
traffic from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker
in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_direct
usage:
w_direct [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
{help,init,average,kinetics,probs,all} ...
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that
have very large requests/response. Default: no limit.
direct kinetics analysis schemes:
{help,init,average,kinetics,probs,all}
help print help for this command or individual subcommands
init calculate state-to-state kinetics by tracing trajectories
average Averages and returns fluxes, rates, and color/state populations.
kinetics Generates rate and flux values from a WESTPA simulation via tracing.
probs Calculates color and state probabilities via tracing.
all Runs the full suite, including the tracing of events.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_select
usage:
w_select [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-W WEST_H5FILE] [--first-iter N_ITER]
[--last-iter N_ITER] [-p MODULE.FUNCTION] [-v] [-a] [-o OUTPUT]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Select dynamics segments matching various criteria. This requires a user-provided prediate function. By default, only matching segments are stored. If the -a/–include-ancestors option is given, then matching segments and their ancestors will be stored.
Predicate function
Segments are selected based on a predicate function, which must be callable
as predicate(n_iter, iter_group)
and return a collection of segment IDs
matching the predicate in that iteration.
The predicate may be inverted by specifying the -v/–invert command-line argument.
Output format
The output file (-o/–output, by default “select.h5”) contains the following datasets:
``/n_iter`` [iteration]
*(Integer)* Iteration numbers for each entry in other datasets.
``/n_segs`` [iteration]
*(Integer)* Number of segment IDs matching the predicate (or inverted
predicate, if -v/--invert is specified) in the given iteration.
``/seg_ids`` [iteration][segment]
*(Integer)* Matching segments in each iteration. For an iteration
``n_iter``, only the first ``n_iter`` entries are valid. For example,
the full list of matching seg_ids in the first stored iteration is
``seg_ids[0][:n_segs[0]]``.
``/weights`` [iteration][segment]
*(Floating-point)* Weights for each matching segment in ``/seg_ids``.
Command-line arguments
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
parallelization options:
--max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that
have very large requests/response. Default: no limit.
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
selection options:
-p MODULE.FUNCTION, --predicate-function MODULE.FUNCTION
Use the given predicate function to match segments. This function should take an
iteration number and the HDF5 group corresponding to that iteration and return a
sequence of seg_ids matching the predicate, as in ``match_predicate(n_iter,
iter_group)``.
-v, --invert Invert the match predicate.
-a, --include-ancestors
Include ancestors of matched segments in output.
- output options:
- -o OUTPUT, --output OUTPUT
Write output to OUTPUT (default: select.h5).
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_states
usage:
w_states [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--show | --append | --replace] [--bstate-file BSTATE_FILE] [--bstate BSTATES]
[--tstate-file TSTATE_FILE] [--tstate TSTATES]
[--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
[--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
[--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
[--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
[--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
[--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
[--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Display or manipulate basis (initial) or target (recycling) states for a WEST simulation. By default,
states are displayed (or dumped to files). If --replace
is specified, all basis/target states are
replaced for the next iteration. If --append
is specified, the given target state(s) are appended to
the list for the next iteration. Appending basis states is not permitted, as this would require
renormalizing basis state probabilities in ways that may be error-prone. Instead, use w_states --show
--bstate-file=bstates.txt
and then edit the resulting bstates.txt
file to include the new desired
basis states, then use w_states --replace --bstate-file=bstates.txt
to update the WEST HDF5 file
appropriately.
optional arguments:
-h, --help show this help message and exit
--bstate-file BSTATE_FILE
Read (--append/--replace) or write (--show) basis state names, probabilities, and
data references from/to BSTATE_FILE.
--bstate BSTATES Add the given basis state (specified as a string 'label,probability[,auxref]') to
the list of basis states (after those specified in --bstate-file, if any). This
argument may be specified more than once, in which case the given states are
appended in the order they are given on the command line.
--tstate-file TSTATE_FILE
Read (--append/--replace) or write (--show) target state names and representative
progress coordinates from/to TSTATE_FILE
--tstate TSTATES Add the given target state (specified as a string 'label,pcoord0[,pcoord1[,...]]')
to the list of target states (after those specified in the file given by
--tstates-from, if any). This argument may be specified more than once, in which
case the given states are appended in the order they appear on the command line.
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
modes of operation:
--show Display current basis/target states (or dump to files).
--append Append the given basis/target states to those currently in use.
--replace Replace current basis/target states with those specified.
parallelization options:
--serial run in serial mode
--parallel run in parallel mode (using processes)
--work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers
are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use
0 for a dedicated server. (Ignored by work managers which do not support this
option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a deprecated
synonym for "master" and "client" is a deprecated synonym for "node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g. /tmp);
on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read this
file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting in
coordinating the communication of other nodes to choose ports randomly, writing
that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic toward
the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result) traffic
from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker in
WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_eddist
usage:
w_eddist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[--max-queue-length MAX_QUEUE_LENGTH] [-b BINEXPR] [-C] [--loose] --istate ISTATE
--fstate FSTATE [--first-iter ITER_START] [--last-iter ITER_STOP] [-k KINETICS]
[-o OUTPUT] [--serial | --parallel | --work-manager WORK_MANAGER]
[--n-workers N_WORKERS] [--zmq-mode MODE] [--zmq-comm-mode COMM_MODE]
[--zmq-write-host-info INFO_FILE] [--zmq-read-host-info INFO_FILE]
[--zmq-upstream-rr-endpoint ENDPOINT] [--zmq-upstream-ann-endpoint ENDPOINT]
[--zmq-downstream-rr-endpoint ENDPOINT] [--zmq-downstream-ann-endpoint ENDPOINT]
[--zmq-master-heartbeat MASTER_HEARTBEAT] [--zmq-worker-heartbeat WORKER_HEARTBEAT]
[--zmq-timeout-factor FACTOR] [--zmq-startup-timeout STARTUP_TIMEOUT]
[--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
Calculate time-resolved transition-event duration distribution from kinetics results
Source data
Source data is collected from the results of ‘w_kinetics trace’ (see w_kinetics trace –help for more information on generating this dataset).
Histogram binning
By default, histograms are constructed with 100 bins in each dimension. This can be overridden by specifying -b/–bins, which accepts a number of different kinds of arguments:
a single integer N
N uniformly spaced bins will be used in each dimension.
a sequence of integers N1,N2,... (comma-separated)
N1 uniformly spaced bins will be used for the first dimension, N2 for the
second, and so on.
a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]
The bin boundaries B11, B12, B13, ... will be used for the first dimension,
B21, B22, B23, ... for the second dimension, and so on. These bin
boundaries need not be uniformly spaced. These expressions will be
evaluated with Python's ``eval`` construct, with ``np`` available for
use [e.g. to specify bins using np.arange()].
The first two forms (integer, list of integers) will trigger a scan of all data in each dimension in order to determine the minimum and maximum values, which may be very expensive for large datasets. This can be avoided by explicitly providing bin boundaries using the list-of-lists form.
Note that these bins are NOT at all related to the bins used to drive WE sampling.
Output format
The output file produced (specified by -o/–output, defaulting to “pdist.h5”) may be fed to plothist to generate plots (or appropriately processed text or HDF5 files) from this data. In short, the following datasets are created:
``histograms``
Normalized histograms. The first axis corresponds to iteration, and
remaining axes correspond to dimensions of the input dataset.
``/binbounds_0``
Vector of bin boundaries for the first (index 0) dimension. Additional
datasets similarly named (/binbounds_1, /binbounds_2, ...) are created
for additional dimensions.
``/midpoints_0``
Vector of bin midpoints for the first (index 0) dimension. Additional
datasets similarly named are created for additional dimensions.
``n_iter``
Vector of iteration numbers corresponding to the stored histograms (i.e.
the first axis of the ``histograms`` dataset).
Subsequent processing
The output generated by this program (-o/–output, default “pdist.h5”) may be
plotted by the plothist
program. See plothist --help
for more
information.
Parallelization
This tool supports parallelized binning, including reading of input data. Parallel processing is the default. For simple cases (reading pre-computed input data, modest numbers of segments), serial processing (–serial) may be more efficient.
Command-line options
optional arguments:
-h, --help show this help message and exit
-b BINEXPR, --bins BINEXPR
Use BINEXPR for bins. This may be an integer, which will be used for each
dimension of the progress coordinate; a list of integers (formatted as
[n1,n2,...]) which will use n1 bins for the first dimension, n2 for the second
dimension, and so on; or a list of lists of boundaries (formatted as [[a1, a2,
...], [b1, b2, ...], ... ]), which will use [a1, a2, ...] as bin boundaries for
the first dimension, [b1, b2, ...] as bin boundaries for the second dimension,
and so on. (Default: 100 bins in each dimension.)
-C, --compress Compress histograms. May make storage of higher-dimensional histograms more
tractable, at the (possible extreme) expense of increased analysis time.
(Default: no compression.)
--loose Ignore values that do not fall within bins. (Risky, as this can make buggy bin
boundaries appear as reasonable data. Only use if you are sure of your bin
boundary specification.)
--istate ISTATE Initial state defining transition event
--fstate FSTATE Final state defining transition event
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
- parallelization options:
- --max-queue-length MAX_QUEUE_LENGTH
Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that have very large requests/response. Default: no limit.
iteration range options:
--first-iter ITER_START
Iteration to begin analysis (default: 1)
--last-iter ITER_STOP
Iteration to end analysis
input/output options:
-k KINETICS, --kinetics KINETICS
Populations and transition rates (including evolution) are stored in KINETICS
(default: kintrace.h5).
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: eddist.h5).
- parallelization options:
- --serial
run in serial mode
- --parallel
run in parallel mode (using processes)
- --work-manager WORK_MANAGER
use the given work manager for parallel task distribution. Available work managers are (‘serial’, ‘threads’, ‘processes’, ‘zmq’); default is ‘processes’
- --n-workers N_WORKERS
Use up to N_WORKERS on this host, for work managers which support this option. Use 0 for a dedicated server. (Ignored by work managers which do not support this option.)
options for ZeroMQ (“zmq”) work manager (master or node):
--zmq-mode MODE Operate as a master (server) or a node (workers/client). "server" is a
deprecated synonym for "master" and "client" is a deprecated synonym for
"node".
--zmq-comm-mode COMM_MODE
Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
communication within a node. IPC (the default) may be more efficient but is not
available on (exceptionally rare) systems without node-local storage (e.g.
/tmp); on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
Store hostname and port information needed to connect to this instance in
INFO_FILE. This allows the master and nodes assisting in coordinating the
communication of other nodes to choose ports randomly. Downstream nodes read
this file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
Read hostname and port information needed to connect to the master (or other
coordinating node) from INFO_FILE. This allows the master and nodes assisting
in coordinating the communication of other nodes to choose ports randomly,
writing that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
ZeroMQ endpoint to which to send request/response (task and result) traffic
toward the master.
--zmq-upstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
ZeroMQ endpoint on which to listen for request/response (task and result)
traffic from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker
in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
Amount of time (in seconds) to wait for communication between the master and at
least one worker. This may need to be changed on very large, heavily-loaded
computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
Amount of time (in seconds) to wait for workers to shut down.
w_ntop
usage:
w_ntop [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version] [-W WEST_H5FILE]
[--first-iter N_ITER] [--last-iter N_ITER] [-a ASSIGNMENTS] [-n COUNT] [-t TIMEPOINT]
[--highweight | --lowweight | --random] [-o OUTPUT]
Select walkers from bins . An assignment file mapping walkers to
bins at each timepoint is required (see``w_assign –help`` for further
information on generating this file). By default, high-weight walkers are
selected (hence the name w_ntop
: select the N top-weighted walkers from
each bin); however, minimum weight walkers and randomly-selected walkers
may be selected instead.
Output format
The output file (-o/–output, by default “ntop.h5”) contains the following datasets:
``/n_iter`` [iteration]
*(Integer)* Iteration numbers for each entry in other datasets.
``/n_segs`` [iteration][bin]
*(Integer)* Number of segments in each bin/state in the given iteration.
This will generally be the same as the number requested with
``--n/--count`` but may be smaller if the requested number of walkers
does not exist.
``/seg_ids`` [iteration][bin][segment]
*(Integer)* Matching segments in each iteration for each bin.
For an iteration ``n_iter``, only the first ``n_iter`` entries are
valid. For example, the full list of matching seg_ids in bin 0 in the
first stored iteration is ``seg_ids[0][0][:n_segs[0]]``.
``/weights`` [iteration][bin][segment]
*(Floating-point)* Weights for each matching segment in ``/seg_ids``.
Command-line arguments
optional arguments:
-h, --help show this help message and exit
--highweight Select COUNT highest-weight walkers from each bin.
--lowweight Select COUNT lowest-weight walkers from each bin.
--random Select COUNT walkers randomly from each bin.
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
input options:
-a ASSIGNMENTS, --assignments ASSIGNMENTS
Use assignments from the given ASSIGNMENTS file (default: assign.h5).
selection options:
-n COUNT, --count COUNT
Select COUNT walkers from each iteration for each bin (default: 1).
-t TIMEPOINT, --timepoint TIMEPOINT
Base selection on the given TIMEPOINT within each iteration. Default (-1)
corresponds to the last timepoint.
output options:
-o OUTPUT, --output OUTPUT
Write output to OUTPUT (default: ntop.h5).
plothist
plothist_instant
usage:
plothist instant [-h] [-o PLOT_OUTPUT] [--hdf5-output HDF5_OUTPUT] [--plot-contour]
[--title TITLE] [--linear | --energy | --zero-energy E | --log10]
[--range RANGE] [--postprocess-function POSTPROCESS_FUNCTION]
[--text-output TEXT_OUTPUT] [--iter N_ITER]
input [DIMENSION] [ADDTLDIM]
Plot a probability distribution for a single WE iteration. The probability
distribution must have been previously extracted with w_pdist
(or, at
least, must be compatible with the output format of w_pdist
; see
w_pdist --help
for more information).
optional arguments:
-h, --help show this help message and exit
input options:
input HDF5 file containing histogram data
DIMENSION Plot for the given DIMENSION, specified as INT[:[LB,UB]:LABEL], where INT is a
zero-based integer identifying the dimension in the histogram, LB and UB are
lower and upper bounds for plotting, and LABEL is the label for the plot axis.
(Default: dimension 0, full range.)
ADDTLDIM For instantaneous/average plots, plot along the given additional dimension,
producing a color map.
--iter N_ITER Plot distribution for iteration N_ITER (default: last completed iteration).
output options:
-o PLOT_OUTPUT, --output PLOT_OUTPUT, --plot-output PLOT_OUTPUT
Store plot as PLOT_OUTPUT. This may be set to an empty string (e.g. --plot-
output='') to suppress plotting entirely. The output format is determined by
filename extension (and thus defaults to PDF). Default: "hist.pdf".
--hdf5-output HDF5_OUTPUT
Store plot data in the HDF5 file HDF5_OUTPUT.
--plot-contour Determines whether or not to superimpose a contour plot over the heatmap for 2D
objects.
--text-output TEXT_OUTPUT
Store plot data in a text format at TEXT_OUTPUT. This option is only valid for
1-D histograms. (Default: no text output.)
plot options:
--title TITLE Include TITLE as the top-of-graph title
--linear Plot the histogram on a linear scale.
--energy Plot the histogram on an inverted natural log scale, corresponding to (free)
energy (default).
--zero-energy E Set the zero of energy to E, which may be a scalar, "min" or "max"
--log10 Plot the histogram on a base-10 log scale.
--range RANGE Plot histogram ordinates over the given RANGE, specified as "LB,UB", where LB
and UB are the lower and upper bounds, respectively. For 1-D plots, this is the
Y axis. For 2-D plots, this is the colorbar axis. (Default: full range.)
--postprocess-function POSTPROCESS_FUNCTION
Names a function (as in module.function) that will be called just prior to
saving the plot. The function will be called as ``postprocess(hist, midpoints,
binbounds)`` where ``hist`` is the histogram that was plotted, ``midpoints`` is
the bin midpoints for each dimension, and ``binbounds`` is the bin boundaries
for each dimension for 2-D plots, or None otherwise. The plot must be modified
in place using the pyplot stateful interface.
plothist_average
usage:
plothist average [-h] [-o PLOT_OUTPUT] [--hdf5-output HDF5_OUTPUT] [--plot-contour]
[--title TITLE] [--linear | --energy | --zero-energy E | --log10]
[--range RANGE] [--postprocess-function POSTPROCESS_FUNCTION]
[--text-output TEXT_OUTPUT] [--first-iter N_ITER] [--last-iter N_ITER]
input [DIMENSION] [ADDTLDIM]
Plot a probability distribution averaged over multiple iterations. The
probability distribution must have been previously extracted with w_pdist
(or, at least, must be compatible with the output format of w_pdist
; see
w_pdist --help
for more information).
optional arguments:
-h, --help show this help message and exit
input options:
input HDF5 file containing histogram data
DIMENSION Plot for the given DIMENSION, specified as INT[:[LB,UB]:LABEL], where INT is a
zero-based integer identifying the dimension in the histogram, LB and UB are
lower and upper bounds for plotting, and LABEL is the label for the plot axis.
(Default: dimension 0, full range.)
ADDTLDIM For instantaneous/average plots, plot along the given additional dimension,
producing a color map.
--first-iter N_ITER Begin averaging at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude averaging with N_ITER, inclusive (default: last completed iteration).
output options:
-o PLOT_OUTPUT, --output PLOT_OUTPUT, --plot-output PLOT_OUTPUT
Store plot as PLOT_OUTPUT. This may be set to an empty string (e.g. --plot-
output='') to suppress plotting entirely. The output format is determined by
filename extension (and thus defaults to PDF). Default: "hist.pdf".
--hdf5-output HDF5_OUTPUT
Store plot data in the HDF5 file HDF5_OUTPUT.
--plot-contour Determines whether or not to superimpose a contour plot over the heatmap for 2D
objects.
--text-output TEXT_OUTPUT
Store plot data in a text format at TEXT_OUTPUT. This option is only valid for
1-D histograms. (Default: no text output.)
plot options:
--title TITLE Include TITLE as the top-of-graph title
--linear Plot the histogram on a linear scale.
--energy Plot the histogram on an inverted natural log scale, corresponding to (free)
energy (default).
--zero-energy E Set the zero of energy to E, which may be a scalar, "min" or "max"
--log10 Plot the histogram on a base-10 log scale.
--range RANGE Plot histogram ordinates over the given RANGE, specified as "LB,UB", where LB
and UB are the lower and upper bounds, respectively. For 1-D plots, this is the
Y axis. For 2-D plots, this is the colorbar axis. (Default: full range.)
--postprocess-function POSTPROCESS_FUNCTION
Names a function (as in module.function) that will be called just prior to
saving the plot. The function will be called as ``postprocess(hist, midpoints,
binbounds)`` where ``hist`` is the histogram that was plotted, ``midpoints`` is
the bin midpoints for each dimension, and ``binbounds`` is the bin boundaries
for each dimension for 2-D plots, or None otherwise. The plot must be modified
in place using the pyplot stateful interface.
plothist_evolution
usage:
plothist evolution [-h] [-o PLOT_OUTPUT] [--hdf5-output HDF5_OUTPUT] [--plot-contour]
[--title TITLE] [--linear | --energy | --zero-energy E | --log10]
[--range RANGE] [--postprocess-function POSTPROCESS_FUNCTION]
[--first-iter N_ITER] [--last-iter N_ITER] [--step-iter STEP]
input [DIMENSION]
Plot a probability distribution as it evolves over iterations. The
probability distribution must have been previously extracted with w_pdist
(or, at least, must be compatible with the output format of w_pdist
; see
w_pdist --help
for more information).
optional arguments:
-h, --help show this help message and exit
input options:
input HDF5 file containing histogram data
DIMENSION Plot for the given DIMENSION, specified as INT[:[LB,UB]:LABEL], where INT is a
zero-based integer identifying the dimension in the histogram, LB and UB are
lower and upper bounds for plotting, and LABEL is the label for the plot axis.
(Default: dimension 0, full range.)
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
--step-iter STEP Average in blocks of STEP iterations.
output options:
-o PLOT_OUTPUT, --output PLOT_OUTPUT, --plot-output PLOT_OUTPUT
Store plot as PLOT_OUTPUT. This may be set to an empty string (e.g. --plot-
output='') to suppress plotting entirely. The output format is determined by
filename extension (and thus defaults to PDF). Default: "hist.pdf".
--hdf5-output HDF5_OUTPUT
Store plot data in the HDF5 file HDF5_OUTPUT.
--plot-contour Determines whether or not to superimpose a contour plot over the heatmap for 2D
objects.
plot options:
--title TITLE Include TITLE as the top-of-graph title
--linear Plot the histogram on a linear scale.
--energy Plot the histogram on an inverted natural log scale, corresponding to (free)
energy (default).
--zero-energy E Set the zero of energy to E, which may be a scalar, "min" or "max"
--log10 Plot the histogram on a base-10 log scale.
--range RANGE Plot histogram ordinates over the given RANGE, specified as "LB,UB", where LB
and UB are the lower and upper bounds, respectively. For 1-D plots, this is the
Y axis. For 2-D plots, this is the colorbar axis. (Default: full range.)
--postprocess-function POSTPROCESS_FUNCTION
Names a function (as in module.function) that will be called just prior to
saving the plot. The function will be called as ``postprocess(hist, midpoints,
binbounds)`` where ``hist`` is the histogram that was plotted, ``midpoints`` is
the bin midpoints for each dimension, and ``binbounds`` is the bin boundaries
for each dimension for 2-D plots, or None otherwise. The plot must be modified
in place using the pyplot stateful interface.
usage:
plothist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
{help,instant,average,evolution} ...
Plot probability density functions (histograms) generated by w_pdist or other programs conforming to the same output format. This program operates in one of three modes:
instant
Plot 1-D and 2-D histograms for an individual iteration. See
``plothist instant --help`` for more information.
average
Plot 1-D and 2-D histograms, averaged over several iterations. See
``plothist average --help`` for more information.
evolution
Plot the time evolution 1-D histograms as waterfall (heat map) plots.
See ``plothist evolution --help`` for more information.
This program takes the output of w_pdist
as input (see w_pdist --help
for more information), and can generate any kind of graphical output that
matplotlib supports.
Command-line options
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
plotting modes:
{help,instant,average,evolution}
help print help for this command or individual subcommands
instant plot probability distribution for a single WE iteration
average plot average of a probability distribution over a WE simulation
evolution plot evolution of a probability distribution over the course of a WE simulation
ploterr
usage:
ploterrs [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
{help,d.kinetics,d.probs,rw.probs,rw.kinetics,generic} ...
Plots error ranges for weighted ensemble datasets.
Command-line options
optional arguments:
-h, --help show this help message and exit
general options:
-r RCFILE, --rcfile RCFILE
use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet emit only essential information
--verbose emit extra information
--debug enable extra checks and emit copious information
--version show program's version number and exit
supported input formats:
{help,d.kinetics,d.probs,rw.probs,rw.kinetics,generic}
help print help for this command or individual subcommands
d.kinetics output of w_direct kinetics
d.probs output of w_direct probs
rw.probs output of w_reweight probs
rw.kinetics output of w_reweight kinetics
generic arbitrary HDF5 file and dataset
w_kinetics
WARNING: w_kinetics is being deprecated. Please use w_direct instead.
usage:
w_kinetics trace [-h] [-W WEST_H5FILE] [--first-iter N_ITER] [--last-iter N_ITER]
[--step-iter STEP] [-a ASSIGNMENTS] [-o OUTPUT]
Calculate state-to-state rates and transition event durations by tracing trajectories.
A bin assignment file (usually “assign.h5”) including trajectory labeling is required (see “w_assign –help” for information on generating this file).
This subcommand for w_direct is used as input for all other w_direct subcommands, which will convert the flux data in the output file into average rates/fluxes/populations with confidence intervals.
Output format
The output file (-o/–output, by default “direct.h5”) contains the following datasets:
``/conditional_fluxes`` [iteration][state][state]
*(Floating-point)* Macrostate-to-macrostate fluxes. These are **not**
normalized by the population of the initial macrostate.
``/conditional_arrivals`` [iteration][stateA][stateB]
*(Integer)* Number of trajectories arriving at state *stateB* in a given
iteration, given that they departed from *stateA*.
``/total_fluxes`` [iteration][state]
*(Floating-point)* Total flux into a given macrostate.
``/arrivals`` [iteration][state]
*(Integer)* Number of trajectories arriving at a given state in a given
iteration, regardless of where they originated.
``/duration_count`` [iteration]
*(Integer)* The number of event durations recorded in each iteration.
``/durations`` [iteration][event duration]
*(Structured -- see below)* Event durations for transition events ending
during a given iteration. These are stored as follows:
istate
*(Integer)* Initial state of transition event.
fstate
*(Integer)* Final state of transition event.
duration
*(Floating-point)* Duration of transition, in units of tau.
weight
*(Floating-point)* Weight of trajectory at end of transition, **not**
normalized by initial state population.
Because state-to-state fluxes stored in this file are not normalized by
initial macrostate population, they cannot be used as rates without further
processing. The w_direct kinetics
command is used to perform this normalization
while taking statistical fluctuation and correlation into account. See
w_direct kinetics --help
for more information. Target fluxes (total flux
into a given state) require no such normalization.
Command-line options
optional arguments:
-h, --help show this help message and exit
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
--step-iter STEP Analyze/report in blocks of STEP iterations.
input/output options:
-a ASSIGNMENTS, --assignments ASSIGNMENTS
Bin assignments and macrostate definitions are in ASSIGNMENTS (default:
assign.h5).
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: kintrace.h5).
w_stateprobs
WARNING: w_stateprobs is being deprecated. Please use w_direct instead.
usage:
w_stateprobs trace [-h] [-W WEST_H5FILE] [--first-iter N_ITER] [--last-iter N_ITER]
[--step-iter STEP] [-a ASSIGNMENTS] [-o OUTPUT] [-k KINETICS]
[--disable-bootstrap] [--disable-correl] [--alpha ALPHA]
[--autocorrel-alpha ACALPHA] [--nsets NSETS] [-e {cumulative,blocked,none}]
[--window-frac WINDOW_FRAC] [--disable-averages]
Calculate average populations and associated errors in state populations from weighted ensemble data. Bin assignments, including macrostate definitions, are required. (See “w_assign –help” for more information).
Output format
The output file (-o/–output, usually “direct.h5”) contains the following dataset:
/avg_state_probs [state]
(Structured -- see below) Population of each state across entire
range specified.
/avg_color_probs [state]
(Structured -- see below) Population of each ensemble across entire
range specified.
If –evolution-mode is specified, then the following additional datasets are available:
/state_pop_evolution [window][state]
(Structured -- see below). State populations based on windows of
iterations of varying width. If --evolution-mode=cumulative, then
these windows all begin at the iteration specified with
--start-iter and grow in length by --step-iter for each successive
element. If --evolution-mode=blocked, then these windows are all of
width --step-iter (excluding the last, which may be shorter), the first
of which begins at iteration --start-iter.
/color_prob_evolution [window][state]
(Structured -- see below). Ensemble populations based on windows of
iterations of varying width. If --evolution-mode=cumulative, then
these windows all begin at the iteration specified with
--start-iter and grow in length by --step-iter for each successive
element. If --evolution-mode=blocked, then these windows are all of
width --step-iter (excluding the last, which may be shorter), the first
of which begins at iteration --start-iter.
The structure of these datasets is as follows:
iter_start
(Integer) Iteration at which the averaging window begins (inclusive).
iter_stop
(Integer) Iteration at which the averaging window ends (exclusive).
expected
(Floating-point) Expected (mean) value of the observable as evaluated within
this window, in units of inverse tau.
ci_lbound
(Floating-point) Lower bound of the confidence interval of the observable
within this window, in units of inverse tau.
ci_ubound
(Floating-point) Upper bound of the confidence interval of the observable
within this window, in units of inverse tau.
stderr
(Floating-point) The standard error of the mean of the observable
within this window, in units of inverse tau.
corr_len
(Integer) Correlation length of the observable within this window, in units
of tau.
Each of these datasets is also stamped with a number of attributes:
mcbs_alpha
(Floating-point) Alpha value of confidence intervals. (For example,
*alpha=0.05* corresponds to a 95% confidence interval.)
mcbs_nsets
(Integer) Number of bootstrap data sets used in generating confidence
intervals.
mcbs_acalpha
(Floating-point) Alpha value for determining correlation lengths.
Command-line options
optional arguments:
-h, --help show this help message and exit
WEST input data options:
-W WEST_H5FILE, --west-data WEST_H5FILE
Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
west.cfg).
iteration range:
--first-iter N_ITER Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER Conclude analysis with N_ITER, inclusive (default: last completed iteration).
--step-iter STEP Analyze/report in blocks of STEP iterations.
input/output options:
-a ASSIGNMENTS, --assignments ASSIGNMENTS
Bin assignments and macrostate definitions are in ASSIGNMENTS (default:
assign.h5).
-o OUTPUT, --output OUTPUT
Store results in OUTPUT (default: stateprobs.h5).
input/output options:
-k KINETICS, --kinetics KINETICS
Populations and transition rates are stored in KINETICS (default: assign.h5).
confidence interval calculation options:
--disable-bootstrap, -db
Enable the use of Monte Carlo Block Bootstrapping.
--disable-correl, -dc
Disable the correlation analysis.
--alpha ALPHA Calculate a (1-ALPHA) confidence interval' (default: 0.05)
--autocorrel-alpha ACALPHA
Evaluate autocorrelation to (1-ACALPHA) significance. Note that too small an
ACALPHA will result in failure to detect autocorrelation in a noisy flux signal.
(Default: same as ALPHA.)
--nsets NSETS Use NSETS samples for bootstrapping (default: chosen based on ALPHA)
calculation options:
-e {cumulative,blocked,none}, --evolution-mode {cumulative,blocked,none}
How to calculate time evolution of rate estimates. ``cumulative`` evaluates rates
over windows starting with --start-iter and getting progressively wider to --stop-
iter by steps of --step-iter. ``blocked`` evaluates rates over windows of width
--step-iter, the first of which begins at --start-iter. ``none`` (the default)
disables calculation of the time evolution of rate estimates.
--window-frac WINDOW_FRAC
Fraction of iterations to use in each window when running in ``cumulative`` mode.
The (1 - frac) fraction of iterations will be discarded from the start of each
window.
misc options:
--disable-averages, -da
Whether or not the averages should be printed to the console (set to FALSE if flag
is used).
HDF5 File Schema
WESTPA stores all of its simulation data in the cross-platform, self-describing HDF5 file format. This file format can be read and written by a variety of languages and toolkits, including C/C++, Fortran, Python, Java, and Matlab so that analysis of weighted ensemble simulations is not tied to using the WESTPA framework. HDF5 files are organized like a filesystem, where arbitrarily-nested groups (i.e. directories) are used to organize datasets (i.e. files). The excellent HDFView program may be used to explore WEST data files.
The canonical file format reference for a given version of the WEST code is described in src/west/data_manager.py.
Overall structure
/
#ibstates/
index
naming
bstate_index
bstate_pcoord
istate_index
istate_pcoord
#tstates/
index
bin_topologies/
index
pickles
iterations/
iter_XXXXXXXX/\|iter_XXXXXXXX/
auxdata/
bin_target_counts
ibstates/
bstate_index
bstate_pcoord
istate_index
istate_pcoord
pcoord
seg_index
wtgraph
...
summary
The root group (/)
The root of the WEST HDF5 file contains the following entries (where a trailing “/” denotes a group):
Name |
Type |
Description |
---|---|---|
ibstates/ |
Group |
Initial and basis states for this simulation |
tstates/ |
Group |
Target (recycling) states for this simulation; may be empty |
bin_topologies/ |
Group |
Data pertaining to the binning scheme used in each iteration |
iterations/ |
Group |
Iteration data |
summary |
Dataset (1-dimensional, compound) |
Summary data by iteration |
The iteration summary table (/summary)
Field |
Description |
---|---|
n_particles |
the total number of walkers in this iteration |
norm |
total probability, for stability monitoring |
min_bin_prob |
smallest probability contained in a bin |
max_bin_prob |
largest probability contained in a bin |
min_seg_prob |
smallest probability carried by a walker |
max_seg_prob |
largest probability carried by a walker |
cputime |
total CPU time (in seconds) spent on propagation for this iteration |
walltime |
total wallclock time (in seconds) spent on this iteration |
binhash |
a hex string identifying the binning used in this iteration |
Per iteration data (/iterations/iter_XXXXXXXX)
Data for each iteration is stored in its own group, named according to the
iteration number and zero-padded out to 8 digits, as in
/iterations/iter_00000001
for iteration 1. This is done solely for
convenience in dealing with the data in external utilities that sort output by
group name lexicographically. The field width is in fact configurable via the
iter_prec
configuration entry under data
section of the WESTPA
configuration file.
The HDF5 group for each iteration contains the following elements:
Name |
Type |
Description |
---|---|---|
auxdata/ |
Group |
All user-defined auxiliary data0 sets |
bin_target_counts |
Dataset (1-dimensional) |
The per-bin target count for the iteration |
ibstates/ |
Group |
Initial and basis state data for the iteration |
pcoord |
Dataset (3-dimensional) |
Progress coordinate data for the iteration stored as a (num of segments, pcoord_len, pcoord_ndim) array |
seg_index |
Dataset (1-dimensional, compound) |
Summary data for each segment |
wtgraph |
Dataset (1-dimensional) |
The segment summary table (/iterations/iter_XXXXXXXX/seg_index)
Field |
Description |
---|---|
weight |
Segment weight |
parent_id |
Index of parent |
wtg_n_parents |
|
wtg_offset |
|
cputime |
Total cpu time required to run the segment |
walltime |
Total walltime required to run the segment |
endpoint_type |
|
status |
Bin Topologies group (/bin_topologies)
Bin topologies used during a WE simulation are stored as a unique hash
identifier and a serialized BinMapper
object in python pickle format. This group contains
two datasets:
index
: Compound array containing the bin hash and pickle lengthpickle
: The pickledBinMapper
objects for each unique mapper stored in a (num unique mappers, max pickled size) array
Checklist
Configuring a WESTPA Simulation
Files for dynamics propagation
Have you set up all of the files for propagating the dynamics (e.g. for GROMACS, the .top, .gro, .mdp, and .ndx files)?
System implementation (
system.py
)Is
self.pcoord_len
set to the number of data points that corresponds to the frequency with which the dynamics engine outputs the progress coordinate? Note: Many MD engines (e.g. GROMACS) output the initial point (i.e. zero).Are the bins in the expected positions? You can easily view the positions of the bins using a Python interpreter.
Initializing the simulation (
init.sh
)Is the directory structure for the trajectory output files consistent with specifications in the master configuration file (
west.cfg
)?Are the basis (bstate) states, and if applicable, target states (tstate), specified correctly?
Calculating the progress coordinate for initial states (
get_pcoord.sh
)Ensure that the procedure to extract the progress coordinate works by manually checking the procedure on one (or more) basis state files.
If your initialization (
init.sh
) gives an error message indicating the “incorrect shape” of the progress coordinate, check that get_pcoord.sh is not writing to a single file. If this is the case, w_init will crash since multiple threads will be simultaneously writing to a single file. To fix this issue, you can add $$ to the file name (e.g. changeOUT=dist.xvg
toOUT=dist_$$.xvg
) inget_pcoord.sh
.
Segment implementation (
runseg.sh
)Ensure that the progress coordinate is being calculated correctly. If necessary, manually run a single dynamics segment (τ) for a single trajectory walker to do so (e.g. for GROMACS, run the .tpr file for a length of τ). Double check that if any analysis programs are being run that their input is correct.
Are you feeding the velocities and state information required for the thermostat and barostat from one dynamics segment to the next? In GROMACS, this information is stored in the .edr and .trr files.
Log of simulation progress (
west.h5
)Check that the first iteration has been initialized, i.e. typing:
h5ls west.h5/iterations
at the command line gives:
iter_00000001 Group
In addition, the progress coordinate should be initialized as well, i.e. using the command:
h5ls -d west.h5/iterations/iter_00000001/pcoord
shows that the array is populated by zeros and the first point is the value calculated by get_pcoord.sh:
pcoord Dataset {10, 21, 1} Data: (0,0,0) 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, (2,15,0) 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, (5,8,0) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, (8,2,0) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Running a WESTPA simulation
If you encounter an issue while running the simulation
Use the
--debug
option on the servers w_run and save the output to a file. (note that this will generate a very detailed log of the process, try searching for “ERROR” for any errors and “iteration” to look at every iteration)Use a program like hdfview, h5ls or Python with h5py library to open the
west.h5
file and ensure that the progress coordinate is being passed around correctly.Use hdfview, h5ls or Python with h5py library to ensure that the number of trajectory walkers is correct.
Is your simulation failing while the progress coordinate is being calculated?
One of the most error prone part during an iteration is the progress coordinate extraction. Programs that are not designed for quick execution have a lot of trouble during this step (VMD is a very commonly encountered one for example). Probably the best way to deal with this issue is to hard code a script to do the progress coordinate extraction. If you are doing molecular dynamics simulations multiple libraries for Python and C/C++ that deal with most output formats for MD packages exist and they usually come with a lot of convenience functions that can help you extract the progress coordinate. AMBER tools and GROMACS tools seems to work adequately for this purpose as well.
Is your progress coordinate what you think it is?
Once your simulation it is running, it is well worth your time to ensure that the progress coordinate being reported is what you think it is. This can be done in a number of ways:
Check the
seg_log
output. This captures the standard error/output from the terminal session that your segment ran in, assuming you are running the executable propagator, and can be useful to ensure that everything is being done as you believe it should be (GROMACS tools, such asg_dist
, for instance, report what groups have their distance being calculated here).Look at a structure! Do so in a program such as VMD or pyMOL, and calculate your progress coordinate manually and check it visually, if feasible. Does it look correct, and seem to match what’s being reported in the .h5 file? This is well worth your time before the simulation has proceeded very far, and can save a significant amount of wallclock and computational time.
Analyzing a WESTPA simulation
If you are running the analysis on shared computing resources
Be sure to use the
--serial
flag (see the individual tool documentation). Otherwise, many of the included tools default to parallel mode (w_assign, for instance), which will create as many Python threads as there are CPU cores available.
Frequently Asked Questions (FAQ)
This page may be outdated, the most recent list of FAQs are available here:
Simulation
How can I cleanly shutdown a simulation (without corrupting the h5 file)?
It is generally safe to shutdown a WESTPA simulation by simply canceling the job through your queue management. However, to ensure data integrity in the h5 file, you should wait until the WESTPA log indicates that an iteration has begun or is occurring; canceling a job too quickly after submission can result in the absolute corruption of the h5 file and should be avoided.
Storage of Large Files
During a normal WESTPA run, many small files are created and it is convenient to tar these into a larger file (one tarball per iteration, for instance). It is generally best to do this ‘offline’. An important aspect to consider is that some disk systems, such as LUSTRE, will suffer impaired performance if very large files are created. On Stampede, for instance, any file larger than 200 GB must be ‘striped’ properly (such that its individual bits are spread across numerous disks).
Within the user guide for such systems, there is generally a section on how to handle large files. Some computers have special versions of tar which stripe appropriately; others do not (such as Stampede). For those that do not, it may be necessary to contact the sysadmin, and/or create a directory where you can place your tarball with a different stripe level than the default.
H5py Inflate() Failed error
While running or analyzing a simulation, you may run into an error such
as IOError: Can't write data (Inflate() failed)
. These errors may be
related to an open bug in H5py. However, the following tips may help you
to find a workaround.
WESTPA may present you with such an error when unable to read or write a
data set. In the case that a simulation gives this error when you
attempt to run it, it may be helpful to check if a data set may be read
or written to using an interactive Python session. Restarting the
simulation may require deleting and remaking the data set. Also, this
error may be related to compression and other storage options. Thus, it
may be helpful to disable compression and chunked storage. Note that
existing datasets will retain compression and other options given to
them at the time of their creation, so it may be necessary to truncate
an iteration (for example, using w_truncate
) in order for changes to
take effect.
This error may also occur during repeated opening (e.g., 1000s of times)
of an HDF5 data set. Thus, this error may occur while running analysis
scripts. In this case, it may be helpful to cache data sets in physical
memory (RAM) as numpy
arrays when they are read, so that the script
loads the dataset a minimal number of times.
Dynamics Packages
WESTPA was designed to work cleanly with any dynamics package available (using the executable propagator); however, many of the tips and tricks available on the web or the user manual for these packages make the (reasonable) assumption that you will be running a set of brute force trajectories. As such, some of their guidelines for handling periodic boundary conditions may not be applicable.
How can I restart a WESTPA simulation?
In general restarting a westpa simulation will restart an incomplete iteration, retaining data from segments that have completed and re-running segments that were incomplete (or never started).
In case that the iteration data got corrupted or you want to go back to an specific iteration and change something, you need to delete all the trajectory segments and other files related to that iteration and run w_truncate on that iteration. This will delete westpa’s information about the nth iteration, which includes which segments have run and which have not. Then restarting your westpa simulation will restart that iteration afresh.
GROMACS
Periodic Boundary Conditions
While many of the built in tools now handle periodic boundary conditions cleanly (such as g_dist) with relatively little user interaction, others, such as g_rms, do not. If your simulation analysis protocol requires you to run such a tool, you must correct for the periodic boundary conditions before running it. While there are guidelines available to help you correct for whatever conditions your system may have here, there is an implicit assumption that you have one long running trajectory.
It will be necessary, within your executable propagator (usually runseg.sh) to run trjconv (typically, two or three times, depending on your needs: once to remove the periodic boundary conditions, then to make molecules whole, then to remove any jumps). If no extra input is supplied (the -s flag in GROMACS 4.X), GROMACS uses the first frame of your segment trajectory as a reference state to remove jumps. If your segment’s parent ended the previous iteration having jumped across the box barrier, trjconv will erroneously assume this is the correct state and ‘correct’ any jump back across the barrier. This can result in unusually high RMSD values for one segment for one or more iterations, and can show as discontinuities on the probability distribution. It is important to note that a lack of discontinuities does not imply a lack of imaging problems.
To fix this, simply pass in the last frame of the imaged parent trajectory and use that as the reference structure for trjconv. This will ensure that trjconv is aware if your segment has crossed the barrier at time 0 and will make the appropriate corrections.
Development
- I’m trying to profile a parallel script using the –profile
option of bin/west. I get a PicklingError. What gives?
When executing a script using –profile, the following error may crop up:
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
The cProfile module used by the –profile option modifies function definitions such that they are no longer pickleable, meaning that they cannot be passed through the work manager to other processes. If you absolutely must profile a parallel script, use the threads work manager.