w_assign
w_assign
uses simulation output to assign walkers to user-specified bins
and macrostates. These assignments are required for some other simulation
tools, namely w_kinetics
and w_kinavg
.
w_assign
supports parallelization (see general work manager options for more on command line options
to specify a work manager).
Overview
Usage:
w_assign [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE] [-o OUTPUT]
[--bins-from-system | --bins-from-expr BINS_FROM_EXPR | --bins-from-function BINS_FROM_FUNCTION]
[-p MODULE.FUNCTION]
[--states STATEDEF [STATEDEF ...] | --states-from-file STATEFILE | --states-from-function STATEFUNC]
[--wm-work-manager WORK_MANAGER] [--wm-n-workers N_WORKERS]
[--wm-zmq-mode MODE] [--wm-zmq-info INFO_FILE]
[--wm-zmq-task-endpoint TASK_ENDPOINT]
[--wm-zmq-result-endpoint RESULT_ENDPOINT]
[--wm-zmq-announce-endpoint ANNOUNCE_ENDPOINT]
[--wm-zmq-listen-endpoint ANNOUNCE_ENDPOINT]
[--wm-zmq-heartbeat-interval INTERVAL]
[--wm-zmq-task-timeout TIMEOUT]
[--wm-zmq-client-comm-mode MODE]
Command-Line Options
See the general command-line tool reference for more information on the general options.
Input/output Options
-W, --west-data /path/to/file
Read simulation result data from file *file*. (**Default:** The
*hdf5* file specified in the configuration file, by default
**west.h5**)
-o, --output /path/to/file
Write assignment results to file *outfile*. (**Default:** *hdf5*
file **assign.h5**)
Binning Options
Specify how binning is to be assigned to the dataset.:
--bins-from-system
Use binning scheme specified by the system driver; system driver can be
found in the west configuration file, by default named **west.cfg**
(**Default binning**)
--bins-from-expr bin_expr
Use binning scheme specified in *``bin_expr``*, which takes the form a
Python list of lists, where each inner list corresponds to the binning a
given dimension. (for example, "[[0,1,2,4,inf],[-inf,0,inf]]" specifies bin
boundaries for two dimensional progress coordinate. Note that this option
accepts the special symbol 'inf' for floating point infinity
--bins-from-function bin_func
Bins specified by calling an external function *``bin_func``*.
*``bin_func``* should be formatted as '[PATH:]module.function', where the
function 'function' in module 'module' will be used
Macrostate Options
You can optionally specify how to assign user-defined macrostates. Note
that macrostates must be assigned for subsequent analysis tools, namely
w_kinetics
and w_kinavg
.:
--states statedef [statedef ...]
Specify a macrostate for a single bin as *``statedef``*, formatted
as a coordinate tuple where each coordinate specifies the bin to
which it belongs, for instance:
'[1.0, 2.0]' assigns a macrostate corresponding to the bin that
contains the (two-dimensional) progress coordinates 1.0 and 2.0.
Note that a macrostate label can optionally by specified, for
instance: 'bound:[1.0, 2.0]' assigns the corresponding bin
containing the given coordinates the macrostate named 'bound'. Note
that multiple assignments can be specified with this command, but
only one macrostate per bin is possible - if you wish to specify
multiple bins in a single macrostate, use the
*``--states-from-file``* option.
--states-from-file statefile
Read macrostate assignments from *yaml* file *``statefile``*. This
option allows you to assign multiple bins to a single macrostate.
The following example shows the contents of *``statefile``* that
specify two macrostates, bound and unbound, over multiple bins with
a two-dimensional progress coordinate:
---
states:
- label: unbound
coords:
- [9.0, 1.0]
- [9.0, 2.0]
- label: bound
coords:
- [0.1, 0.0]
Specifying Progress Coordinate
By default, progress coordinate information for each iteration is taken from pcoord dataset in the specified input file (which, by default is west.h5). Optionally, you can specify a function to construct the progress coordinate for each iteration - this may be useful to consolidate data from several sources or otherwise preprocess the progress coordinate data.:
--construct-pcoord module.function, -p module.function
Use the function *module.function* to construct the progress
coordinate for each iteration. This will be called once per
iteration as *function(n_iter, iter_group)* and should return an
array indexable as [seg_id][timepoint][dimension]. The
**default** function returns the 'pcoord' dataset for that iteration
(i.e. the function executes return iter_group['pcoord'][...])
Examples
westpa.cli.tools.w_assign module
- westpa.cli.tools.w_assign.seg_id_dtype
alias of
int64
- westpa.cli.tools.w_assign.weight_dtype
alias of
float64
- westpa.cli.tools.w_assign.index_dtype
alias of
uint16
- westpa.cli.tools.w_assign.assign_and_label(nsegs_lb, nsegs_ub, parent_ids, assign, nstates, state_map, last_labels, pcoords, subsample)
Assign trajectories to bins and last-visted macrostates for each timepoint.
- westpa.cli.tools.w_assign.accumulate_labeled_populations(weights, bin_assignments, label_assignments, labeled_bin_pops)
For a set of segments in one iteration, calculate the average population in each bin, with separation by last-visited macrostate.
- class westpa.cli.tools.w_assign.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- class westpa.cli.tools.w_assign.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- open(mode='r')
- close()
- property weight_dsspec
- property parent_id_dsspec
- class westpa.cli.tools.w_assign.WESTDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
Tool for synthesizing a dataset for analysis from other datasets. This may be done using a custom function, or a list of “data set specifications”. It is anticipated that if several source datasets are required, then a tool will have multiple instances of this class.
- group_name = 'input dataset options'
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_assign.BinMappingComponent
Bases:
WESTToolComponent
Component for obtaining a bin mapper from one of several places based on command-line arguments. Such locations include an HDF5 file that contains pickled mappers (including the primary WEST HDF5 file), the system object, an external function, or (in the common case of rectilinear bins) a list of lists of bin boundaries.
Some configuration is necessary prior to calling process_args() if loading a mapper from HDF5. Specifically, either set_we_h5file_info() or set_other_h5file_info() must be called to describe where to find the appropriate mapper. In the case of set_we_h5file_info(), the mapper used for WE at the end of a given iteration will be loaded. In the case of set_other_h5file_info(), an arbitrary group and hash value are specified; the mapper corresponding to that hash in the given group will be returned.
In the absence of arguments, the mapper contained in an existing HDF5 file is preferred; if that is not available, the mapper from the system driver is used.
This component adds the following arguments to argument parsers:
- --bins-from-system
Obtain bins from the system driver
—bins-from-expr=EXPR Construct rectilinear bins by parsing EXPR and calling RectilinearBinMapper() with the result. EXPR must therefore be a list of lists.
- –bins-from-function=[PATH:]MODULE.FUNC
Call an external function FUNC in module MODULE (optionally adding PATH to the search path when loading MODULE) which, when called, returns a fully-constructed bin mapper.
—bins-from-file Load bin definitions from a YAML configuration file.
- --bins-from-h5file
Load bins from the file being considered; this is intended to mean the master WEST HDF5 file or results of other binning calculations, as appropriate.
- add_args(parser, description='binning options', suppress=[])
Add arguments specific to this component to the given argparse parser.
- add_target_count_args(parser, description='bin target count options')
Add options to the given parser corresponding to target counts.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- set_we_h5file_info(n_iter=None, data_manager=None, required=False)
Set up to load a bin mapper from the master WEST HDF5 file. The mapper is actually loaded from the file when self.load_bin_mapper() is called, if and only if command line arguments direct this. If
required
is true, then a mapper must be available at iterationn_iter
, or else an exception will be raised.
- set_other_h5file_info(topology_group, hashval)
Set up to load a bin mapper from (any) open HDF5 file, where bin topologies are stored in
topology_group
(an h5py Group object) and the desired mapper has hash valuehashval
. The mapper itself is loaded when self.load_bin_mapper() is called.
- class westpa.cli.tools.w_assign.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_assign.WESTPAH5File(*args, **kwargs)
Bases:
File
Generalized input/output for WESTPA simulation (or analysis) data.
Create a new file object.
See the h5py user guide for a detailed explanation of the options.
- name
Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.
- mode
r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise
- driver
Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.
- libver
Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’.
- userblock_size
Desired size of user block. Only allowed when creating a new file (mode w, w- or x).
- swmr
Open the file in SWMR read mode. Only used when mode = ‘r’.
- rdcc_nbytes
Total size of the dataset chunk cache in bytes. The default size is 1024**2 (1 MiB) per dataset. Applies to all datasets unless individually changed.
- rdcc_w0
The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75. Applies to all datasets unless individually changed.
- rdcc_nslots
The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521. Applies to all datasets unless individually changed.
- track_order
Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.
- fs_strategy
The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.
- fs_page_size
File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).
- fs_persist
A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.
- fs_threshold
The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.
- page_buf_size
Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.
- min_meta_keep
Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- min_raw_keep
Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.
- locking
The file locking behavior. Defined as:
False (or “false”) – Disable file locking
True (or “true”) – Enable file locking
“best-effort” – Enable file locking but ignore some errors
None – Use HDF5 defaults
Warning
The HDF5_USE_FILE_LOCKING environment variable can override this parameter.
Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.
- alignment_threshold
Together with
alignment_interval
, this property ensures that any file object greater than or equal in size to the alignment threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.- alignment_interval
This property should be used in conjunction with
alignment_threshold
. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT- meta_block_size
Set the current minimum size, in bytes, of new metadata block allocations. See https://portal.hdfgroup.org/display/HDF5/H5P_SET_META_BLOCK_SIZE
- Additional keywords
Passed on to the selected file driver.
- default_iter_prec = 8
- replace_dataset(*args, **kwargs)
- iter_object_name(n_iter, prefix='', suffix='')
Return a properly-formatted per-iteration name for iteration
n_iter
. (This is used in create/require/get_iter_group, but may also be useful for naming datasets on a per-iteration basis.)
- create_iter_group(n_iter, group=None)
Create a per-iteration data storage group for iteration number
n_iter
in the groupgroup
(which is ‘/iterations’ by default).
- require_iter_group(n_iter, group=None)
Ensure that a per-iteration data storage group for iteration number
n_iter
is available in the groupgroup
(which is ‘/iterations’ by default).
- get_iter_group(n_iter, group=None)
Get the per-iteration data group for iteration number
n_iter
from within the groupgroup
(‘/iterations’ by default).
- westpa.cli.tools.w_assign.get_object(object_name, path=None)
Attempt to load the given object, using additional path information if given.
- westpa.cli.tools.w_assign.parse_pcoord_value(pc_str)
- class westpa.cli.tools.w_assign.WAssign
Bases:
WESTParallelTool
- prog = 'w_assign'
- description = 'Assign walkers to bins, producing a file (by default named "assign.h5")\nwhich can be used in subsequent analysis.\n\nFor consistency in subsequent analysis operations, the entire dataset\nmust be assigned, even if only a subset of the data will be used. This\nensures that analyses that rely on tracing trajectories always know the\noriginating bin of each trajectory.\n\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is provided either by a user-specified function\n(--construct-dataset) or a list of "data set specifications" (--dsspecs).\nIf neither is provided, the progress coordinate dataset \'\'pcoord\'\' is used.\n\nTo use a custom function to extract or calculate data whose probability\ndistribution will be calculated, specify the function in standard Python\nMODULE.FUNCTION syntax as the argument to --construct-dataset. This function\nwill be called as function(n_iter,iter_group), where n_iter is the iteration\nwhose data are being considered and iter_group is the corresponding group\nin the main WEST HDF5 file (west.h5). The function must return data which can\nbe indexed as [segment][timepoint][dimension].\n\nTo use a list of data set specifications, specify --dsspecs and then list the\ndesired datasets one-by-one (space-separated in most shells). These data set\nspecifications are formatted as NAME[,file=FILENAME,slice=SLICE], which will\nuse the dataset called NAME in the HDF5 file FILENAME (defaulting to the main\nWEST HDF5 file west.h5), and slice it with the Python slice expression SLICE\n(as in [0:2] to select the first two elements of the first axis of the\ndataset). The ``slice`` option is most useful for selecting one column (or\nmore) from a multi-column dataset, such as arises when using a progress\ncoordinate of multiple dimensions.\n\n\n-----------------------------------------------------------------------------\nSpecifying macrostates\n-----------------------------------------------------------------------------\n\nOptionally, kinetic macrostates may be defined in terms of sets of bins.\nEach trajectory will be labeled with the kinetic macrostate it was most\nrecently in at each timepoint, for use in subsequent kinetic analysis.\nThis is required for all kinetics analysis (w_kintrace and w_kinmat).\n\nThere are three ways to specify macrostates:\n\n 1. States corresponding to single bins may be identified on the command\n line using the --states option, which takes multiple arguments, one for\n each state (separated by spaces in most shells). Each state is specified\n as a coordinate tuple, with an optional label prepended, as in\n ``bound:1.0`` or ``unbound:(2.5,2.5)``. Unlabeled states are named\n ``stateN``, where N is the (zero-based) position in the list of states\n supplied to --states.\n\n 2. States corresponding to multiple bins may use a YAML input file specified\n with --states-from-file. This file defines a list of states, each with a\n name and a list of coordinate tuples; bins containing these coordinates\n will be mapped to the containing state. For instance, the following\n file::\n\n ---\n states:\n - label: unbound\n coords:\n - [9.0, 1.0]\n - [9.0, 2.0]\n - label: bound\n coords:\n - [0.1, 0.0]\n\n produces two macrostates: the first state is called "unbound" and\n consists of bins containing the (2-dimensional) progress coordinate\n values (9.0, 1.0) and (9.0, 2.0); the second state is called "bound"\n and consists of the single bin containing the point (0.1, 0.0).\n\n 3. Arbitrary state definitions may be supplied by a user-defined function,\n specified as --states-from-function=MODULE.FUNCTION. This function is\n called with the bin mapper as an argument (``function(mapper)``) and must\n return a list of dictionaries, one per state. Each dictionary must contain\n a vector of coordinate tuples with key "coords"; the bins into which each\n of these tuples falls define the state. An optional name for the state\n (with key "label") may also be provided.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file (-o/--output, by default "assign.h5") contains the following\nattributes datasets:\n\n ``nbins`` attribute\n *(Integer)* Number of valid bins. Bin assignments range from 0 to\n *nbins*-1, inclusive.\n\n ``nstates`` attribute\n *(Integer)* Number of valid macrostates (may be zero if no such states are\n specified). Trajectory ensemble assignments range from 0 to *nstates*-1,\n inclusive, when states are defined.\n\n ``/assignments`` [iteration][segment][timepoint]\n *(Integer)* Per-segment and -timepoint assignments (bin indices).\n\n ``/npts`` [iteration]\n *(Integer)* Number of timepoints in each iteration.\n\n ``/nsegs`` [iteration]\n *(Integer)* Number of segments in each iteration.\n\n ``/labeled_populations`` [iterations][state][bin]\n *(Floating-point)* Per-iteration and -timepoint bin populations, labeled\n by most recently visited macrostate. The last state entry (*nstates-1*)\n corresponds to trajectories initiated outside of a defined macrostate.\n\n ``/bin_labels`` [bin]\n *(String)* Text labels of bins.\n\nWhen macrostate assignments are given, the following additional datasets are\npresent:\n\n ``/trajlabels`` [iteration][segment][timepoint]\n *(Integer)* Per-segment and -timepoint trajectory labels, indicating the\n macrostate which each trajectory last visited.\n\n ``/state_labels`` [state]\n *(String)* Labels of states.\n\n ``/state_map`` [bin]\n *(Integer)* Mapping of bin index to the macrostate containing that bin.\n An entry will contain *nbins+1* if that bin does not fall into a\n macrostate.\n\nDatasets indexed by state and bin contain one more entry than the number of\nvalid states or bins. For *N* bins, axes indexed by bin are of size *N+1*, and\nentry *N* (0-based indexing) corresponds to a walker outside of the defined bin\nspace (which will cause most mappers to raise an error). More importantly, for\n*M* states (including the case *M=0* where no states are specified), axes\nindexed by state are of size *M+1* and entry *M* refers to trajectories\ninitiated in a region not corresponding to a defined macrostate.\n\nThus, ``labeled_populations[:,:,:].sum(axis=1)[:,:-1]`` gives overall per-bin\npopulations, for all defined bins and\n``labeled_populations[:,:,:].sum(axis=2)[:,:-1]`` gives overall\nper-trajectory-ensemble populations for all defined states.\n\n\n-----------------------------------------------------------------------------\nParallelization\n-----------------------------------------------------------------------------\n\nThis tool supports parallelized binning, including reading/calculating input\ndata.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- parse_cmdline_states(state_strings)
- load_config_from_west(scheme)
- load_state_file(state_filename)
- states_from_dict(ystates)
- load_states_from_function(statefunc)
- assign_iteration(n_iter, nstates, nbins, state_map, last_labels)
Method to encapsulate the segment slicing (into n_worker slices) and parallel job submission Submits job(s), waits on completion, splices them back together Returns: assignments, trajlabels, pops for this iteration
- go()
Perform the analysis associated with this tool.
- westpa.cli.tools.w_assign.entry_point()