w_pdist
w_pdist
constructs and calculates the progress coordinate probability
distribution’s evolution over a user-specified number of simulation iterations.
w_pdist
supports progress coordinates with dimensionality ≥ 1.
The resulting distribution can be viewed with the plothist tool.
Overview
Usage:
w_pdist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
[-W WEST_H5FILE] [--first-iter N_ITER] [--last-iter N_ITER]
[-b BINEXPR] [-o OUTPUT]
[--construct-dataset CONSTRUCT_DATASET | --dsspecs DSSPEC [DSSPEC ...]]
[--serial | --parallel | --work-manager WORK_MANAGER]
[--n-workers N_WORKERS] [--zmq-mode MODE]
[--zmq-info INFO_FILE] [--zmq-task-endpoint TASK_ENDPOINT]
[--zmq-result-endpoint RESULT_ENDPOINT]
[--zmq-announce-endpoint ANNOUNCE_ENDPOINT]
[--zmq-listen-endpoint ANNOUNCE_ENDPOINT]
[--zmq-heartbeat-interval INTERVAL]
[--zmq-task-timeout TIMEOUT] [--zmq-client-comm-mode MODE]
Note: This tool supports parallelization, which may be more efficient for especially large datasets.
Command-Line Options
See the general command-line tool reference for more information on the general options.
Input/output options
These arguments allow the user to specify where to read input simulation result data and where to output calculated progress coordinate probability distribution data.
Both input and output files are hdf5 format:
-W, --WEST_H5FILE file
Read simulation result data from file *file*. (**Default:** The
*hdf5* file specified in the configuration file (default config file
is *west.h5*))
-o, --output file
Store this tool's output in *file*. (**Default:** The *hdf5* file
**pcpdist.h5**)
Iteration range options
Specify the range of iterations over which to construct the progress coordinate probability distribution.:
--first-iter n_iter
Construct probability distribution starting with iteration *n_iter*
(**Default:** 1)
--last-iter n_iter
Construct probability distribution's time evolution up to (and
including) iteration *n_iter* (**Default:** Last completed
iteration)
Probability distribution binning options
Specify the number of bins to use when constructing the progress coordinate probability distribution. If using a multidimensional progress coordinate, different binning schemes can be used for the probability distribution for each progress coordinate.:
-b binexpr
*binexpr* specifies the number and formatting of the bins. Its
format can be as follows:
1. an integer, in which case all distributions have that many
equal sized bins
2. a python-style list of integers, of length corresponding to
the number of dimensions of the progress coordinate, in which
case each progress coordinate's probability distribution has the
corresponding number of bins
3. a python-style list of lists of scalars, where the list at
each index corresponds to each dimension of the progress
coordinate and specifies specific bin boundaries for that
progress coordinate's probability distribution.
(**Default:** 100 bins for all progress coordinates)
Examples
Assuming simulation results are stored in west.h5 (which is specified in the configuration file named west.cfg), for a simulation with a 1-dimensional progress coordinate:
Calculate a probability distribution histogram using all default options (output file: pdist.h5; histogram binning: 100 equal sized bins; probability distribution over the lowest reached progress coordinate to the largest; work is parallelized over all available local cores using the ‘processes’ work manager):
w_pdist
Same as above, except using the serial work manager (which may be more efficient for smaller datasets):
w_pdist --serial
westpa.cli.tools.w_pdist module
- class westpa.cli.tools.w_pdist.WESTParallelTool(wm_env=None)
Bases:
WESTTool
Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- go()
Perform the analysis associated with this tool.
- main()
A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.
- make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)
A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_pdist.WESTDataReader
Bases:
WESTToolComponent
Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- close()
- open(mode='r')
- property parent_id_dsspec
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- property weight_dsspec
- class westpa.cli.tools.w_pdist.WESTDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
Tool for synthesizing a dataset for analysis from other datasets. This may be done using a custom function, or a list of “data set specifications”. It is anticipated that if several source datasets are required, then a tool will have multiple instances of this class.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- group_name = 'input dataset options'
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_pdist.WESTWDSSynthesizer(default_dsname=None, h5filename=None)
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- group_name = 'weight dataset options'
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- class westpa.cli.tools.w_pdist.IterRangeSelection(data_manager=None)
Bases:
WESTToolComponent
Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.
HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:
first_iter
The first iteration included in the calculation.
last_iter
One past the last iteration included in the calculation.
iter_step
Blocking or sampling period for iterations included in the calculation.
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data exactly for the iteration range specified.
- check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)
Check that the given HDF5 object contains (as denoted by its
iter_start
/iter_stop
attributes) data at least for the iteration range specified.
- check_data_iter_step_conformant(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given
iter_step
is a multiple of the stride with which data was recorded).
- check_data_iter_step_equal(h5object, iter_step=None)
Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.
- iter_block_iter()
Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.
- iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)
Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on
self
. The smallest data type capable of holdingiter_stop
is returned unless otherwise specified using thedtype
argument.
- n_iter_blocks()
Return the number of blocks of iterations (as returned by
iter_block_iter
) selected by –first-iter/–last-iter/–step-iter.
- process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- record_data_iter_range(h5object, iter_start=None, iter_stop=None)
Store attributes
iter_start
anditer_stop
on the given HDF5 object (group/dataset)
- record_data_iter_step(h5object, iter_step=None)
Store attribute
iter_step
on the given HDF5 object (group/dataset).
- slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)
Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.
- class westpa.cli.tools.w_pdist.ProgressIndicatorComponent
Bases:
WESTToolComponent
- add_args(parser)
Add arguments specific to this component to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)
- westpa.cli.tools.w_pdist.histnd(values, binbounds, weights=1.0, out=None, binbound_check=True, ignore_out_of_range=False)
Generate an N-dimensional PDF (or contribution to a PDF) from the given values.
binbounds
is a list of arrays of boundary values, with one entry for each dimension (values
must have as many columns as there are entries inbinbounds
)weight
, if provided, specifies the weight each value contributes to the histogram; this may be a scalar (for equal weights for all values) or a vector of the same length asvalues
(for unequal weights). Ifbinbound_check
is True, then the boundaries are checked for strict positive monotonicity; set to False to shave a few microseconds if you know your bin boundaries to be monotonically increasing.
- westpa.cli.tools.w_pdist.normhistnd(hist, binbounds)
Normalize the N-dimensional histogram
hist
with corresponding bin boundariesbinbounds
. Modifieshist
in place and returns the normalization factor used.
- westpa.cli.tools.w_pdist.isiterable(x)
- class westpa.cli.tools.w_pdist.WPDist
Bases:
WESTParallelTool
- prog = 'w_pdist'
- description = 'Calculate time-resolved, multi-dimensional probability distributions of WE\ndatasets.\n\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is provided either by a user-specified function\n(--construct-dataset) or a list of "data set specifications" (--dsspecs).\nIf neither is provided, the progress coordinate dataset \'\'pcoord\'\' is used.\n\nTo use a custom function to extract or calculate data whose probability\ndistribution will be calculated, specify the function in standard Python\nMODULE.FUNCTION syntax as the argument to --construct-dataset. This function\nwill be called as function(n_iter,iter_group), where n_iter is the iteration\nwhose data are being considered and iter_group is the corresponding group\nin the main WEST HDF5 file (west.h5). The function must return data which can\nbe indexed as [segment][timepoint][dimension].\n\nTo use a list of data set specifications, specify --dsspecs and then list the\ndesired datasets one-by-one (space-separated in most shells). These data set\nspecifications are formatted as NAME[,file=FILENAME,slice=SLICE], which will\nuse the dataset called NAME in the HDF5 file FILENAME (defaulting to the main\nWEST HDF5 file west.h5), and slice it with the Python slice expression SLICE\n(as in [0:2] to select the first two elements of the first axis of the\ndataset). The ``slice`` option is most useful for selecting one column (or\nmore) from a multi-column dataset, such as arises when using a progress\ncoordinate of multiple dimensions.\n\n\n-----------------------------------------------------------------------------\nHistogram binning\n-----------------------------------------------------------------------------\n\nBy default, histograms are constructed with 100 bins in each dimension. This\ncan be overridden by specifying -b/--bins, which accepts a number of different\nkinds of arguments:\n\n a single integer N\n N uniformly spaced bins will be used in each dimension.\n\n a sequence of integers N1,N2,... (comma-separated)\n N1 uniformly spaced bins will be used for the first dimension, N2 for the\n second, and so on.\n\n a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]\n The bin boundaries B11, B12, B13, ... will be used for the first dimension,\n B21, B22, B23, ... for the second dimension, and so on. These bin\n boundaries need not be uniformly spaced. These expressions will be\n evaluated with Python\'s ``eval`` construct, with ``np`` available for\n use [e.g. to specify bins using np.arange()].\n\nThe first two forms (integer, list of integers) will trigger a scan of all\ndata in each dimension in order to determine the minimum and maximum values,\nwhich may be very expensive for large datasets. This can be avoided by\nexplicitly providing bin boundaries using the list-of-lists form.\n\nNote that these bins are *NOT* at all related to the bins used to drive WE\nsampling.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file produced (specified by -o/--output, defaulting to "pdist.h5")\nmay be fed to plothist to generate plots (or appropriately processed text or\nHDF5 files) from this data. In short, the following datasets are created:\n\n ``histograms``\n Normalized histograms. The first axis corresponds to iteration, and\n remaining axes correspond to dimensions of the input dataset.\n\n ``/binbounds_0``\n Vector of bin boundaries for the first (index 0) dimension. Additional\n datasets similarly named (/binbounds_1, /binbounds_2, ...) are created\n for additional dimensions.\n\n ``/midpoints_0``\n Vector of bin midpoints for the first (index 0) dimension. Additional\n datasets similarly named are created for additional dimensions.\n\n ``n_iter``\n Vector of iteration numbers corresponding to the stored histograms (i.e.\n the first axis of the ``histograms`` dataset).\n\n\n-----------------------------------------------------------------------------\nSubsequent processing\n-----------------------------------------------------------------------------\n\nThe output generated by this program (-o/--output, default "pdist.h5") may be\nplotted by the ``plothist`` program. See ``plothist --help`` for more\ninformation.\n\n\n-----------------------------------------------------------------------------\nParallelization\n-----------------------------------------------------------------------------\n\nThis tool supports parallelized binning, including reading of input data.\nParallel processing is the default. For simple cases (reading pre-computed\ninput data, modest numbers of segments), serial processing (--serial) may be\nmore efficient.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n\n'
- add_args(parser)
Add arguments specific to this tool to the given argparse parser.
- process_args(args)
Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)
- go()
Perform the analysis associated with this tool.
- static parse_binspec(binspec)
- construct_bins(bins)
Construct bins according to
bins
, which may be:A scalar integer (for that number of bins in each dimension)
A sequence of integers (specifying number of bins for each dimension)
A sequence of sequences of bin boundaries (specifying boundaries for each dimension)
Sets
self.binbounds
to a list of arrays of bin boundaries appropriate for passing to fasthist.histnd, along withself.midpoints
to the midpoints of the bins.
- scan_data_shape()
- scan_data_range()
Scan input data for range in each dimension. The number of dimensions is determined from the shape of the progress coordinate as of self.iter_start.
- construct_histogram()
Construct a histogram using bins previously constructed with
construct_bins()
. The time series of histogram values is stored inhistograms
. Each histogram in the time series is normalized.
- westpa.cli.tools.w_pdist.entry_point()