w_pdist

w_pdist constructs and calculates the progress coordinate probability distribution’s evolution over a user-specified number of simulation iterations. w_pdist supports progress coordinates with dimensionality ≥ 1.

The resulting distribution can be viewed with the plothist tool.

Overview

Usage:

w_pdist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
                       [-W WEST_H5FILE] [--first-iter N_ITER] [--last-iter N_ITER]
                       [-b BINEXPR] [-o OUTPUT]
                       [--construct-dataset CONSTRUCT_DATASET | --dsspecs DSSPEC [DSSPEC ...]]
                       [--serial | --parallel | --work-manager WORK_MANAGER]
                       [--n-workers N_WORKERS] [--zmq-mode MODE]
                       [--zmq-info INFO_FILE] [--zmq-task-endpoint TASK_ENDPOINT]
                       [--zmq-result-endpoint RESULT_ENDPOINT]
                       [--zmq-announce-endpoint ANNOUNCE_ENDPOINT]
                       [--zmq-listen-endpoint ANNOUNCE_ENDPOINT]
                       [--zmq-heartbeat-interval INTERVAL]
                       [--zmq-task-timeout TIMEOUT] [--zmq-client-comm-mode MODE]

Note: This tool supports parallelization, which may be more efficient for especially large datasets.

Command-Line Options

See the general command-line tool reference for more information on the general options.

Input/output options

These arguments allow the user to specify where to read input simulation result data and where to output calculated progress coordinate probability distribution data.

Both input and output files are hdf5 format:

-W, --WEST_H5FILE file
  Read simulation result data from file *file*. (**Default:** The
  *hdf5* file specified in the configuration file (default config file
  is *west.h5*))

-o, --output file
  Store this tool's output in *file*. (**Default:** The *hdf5* file
  **pcpdist.h5**)

Iteration range options

Specify the range of iterations over which to construct the progress coordinate probability distribution.:

--first-iter n_iter
  Construct probability distribution starting with iteration *n_iter*
  (**Default:** 1)

--last-iter n_iter
  Construct probability distribution's time evolution up to (and
  including) iteration *n_iter* (**Default:** Last completed
  iteration)

Probability distribution binning options

Specify the number of bins to use when constructing the progress coordinate probability distribution. If using a multidimensional progress coordinate, different binning schemes can be used for the probability distribution for each progress coordinate.:

-b binexpr
  *binexpr* specifies the number and formatting of the bins. Its
  format can be as follows:

      1. an integer, in which case all distributions have that many
      equal sized bins
      2. a python-style list of integers, of length corresponding to
      the number of dimensions of the progress coordinate, in which
      case each progress coordinate's probability distribution has the
      corresponding number of bins
      3. a python-style list of lists of scalars, where the list at
      each index corresponds to each dimension of the progress
      coordinate and specifies specific bin boundaries for that
      progress coordinate's probability distribution.

  (**Default:** 100 bins for all progress coordinates)

Examples

Assuming simulation results are stored in west.h5 (which is specified in the configuration file named west.cfg), for a simulation with a 1-dimensional progress coordinate:

Calculate a probability distribution histogram using all default options (output file: pdist.h5; histogram binning: 100 equal sized bins; probability distribution over the lowest reached progress coordinate to the largest; work is parallelized over all available local cores using the ‘processes’ work manager):

w_pdist

Same as above, except using the serial work manager (which may be more efficient for smaller datasets):

w_pdist --serial

westpa.cli.tools.w_pdist module

class westpa.cli.tools.w_pdist.WESTParallelTool(wm_env=None)

Bases: WESTTool

Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.

make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None): A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.

add_args(parser): Add arguments specific to this tool to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)

go(): Perform the analysis associated with this tool.

main(): A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.

class westpa.cli.tools.w_pdist.WESTDataReader

Bases: WESTToolComponent

Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.

add_args(parser): Add arguments specific to this component to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

open(mode='r')

close()

property weight_dsspec

property parent_id_dsspec

class westpa.cli.tools.w_pdist.WESTDSSynthesizer(default_dsname=None, h5filename=None)

Bases: WESTToolComponent

Tool for synthesizing a dataset for analysis from other datasets. This may be done using a custom function, or a list of “data set specifications”. It is anticipated that if several source datasets are required, then a tool will have multiple instances of this class.

group_name = 'input dataset options'

add_args(parser): Add arguments specific to this component to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

class westpa.cli.tools.w_pdist.WESTWDSSynthesizer(default_dsname=None, h5filename=None)

Bases: WESTToolComponent

group_name = 'weight dataset options'

add_args(parser): Add arguments specific to this component to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

class westpa.cli.tools.w_pdist.IterRangeSelection(data_manager=None)

Bases: WESTToolComponent

Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.

HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:

first_iter
The first iteration included in the calculation.

last_iter
One past the last iteration included in the calculation.

iter_step
Blocking or sampling period for iterations included in the calculation.

add_args(parser): Add arguments specific to this component to the given argparse parser.

process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1): Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

iter_block_iter(): Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.

n_iter_blocks(): Return the number of blocks of iterations (as returned by iter_block_iter) selected by –first-iter/–last-iter/–step-iter.

record_data_iter_range(h5object, iter_start=None, iter_stop=None): Store attributes iter_start and iter_stop on the given HDF5 object (group/dataset)

record_data_iter_step(h5object, iter_step=None): Store attribute iter_step on the given HDF5 object (group/dataset).

check_data_iter_range_least(h5object, iter_start=None, iter_stop=None): Check that the given HDF5 object contains (as denoted by its iter_start/iter_stop attributes) data at least for the iteration range specified.

check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None): Check that the given HDF5 object contains (as denoted by its iter_start/iter_stop attributes) data exactly for the iteration range specified.

check_data_iter_step_conformant(h5object, iter_step=None): Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given iter_step is a multiple of the stride with which data was recorded).

check_data_iter_step_equal(h5object, iter_step=None): Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.

slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0): Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.

iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None): Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on self. The smallest data type capable of holding iter_stop is returned unless otherwise specified using the dtype argument.

class westpa.cli.tools.w_pdist.ProgressIndicatorComponent

Bases: WESTToolComponent

add_args(parser): Add arguments specific to this component to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

westpa.cli.tools.w_pdist.histnd(values, binbounds, weights=1.0, out=None, binbound_check=True, ignore_out_of_range=False): Generate an N-dimensional PDF (or contribution to a PDF) from the given values. binbounds is a list of arrays of boundary values, with one entry for each dimension (values must have as many columns as there are entries in binbounds) weight, if provided, specifies the weight each value contributes to the histogram; this may be a scalar (for equal weights for all values) or a vector of the same length as values (for unequal weights). If binbound_check is True, then the boundaries are checked for strict positive monotonicity; set to False to shave a few microseconds if you know your bin boundaries to be monotonically increasing.

westpa.cli.tools.w_pdist.normhistnd(hist, binbounds): Normalize the N-dimensional histogram hist with corresponding bin boundaries binbounds. Modifies hist in place and returns the normalization factor used.

westpa.cli.tools.w_pdist.isiterable(x)

class westpa.cli.tools.w_pdist.WPDist

Bases: WESTParallelTool

prog = 'w_pdist'

description = 'Calculate time-resolved, multi-dimensional probability distributions of WE\ndatasets.\n\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is provided either by a user-specified function\n(--construct-dataset) or a list of "data set specifications" (--dsspecs).\nIf neither is provided, the progress coordinate dataset \'\'pcoord\'\' is used.\n\nTo use a custom function to extract or calculate data whose probability\ndistribution will be calculated, specify the function in standard Python\nMODULE.FUNCTION syntax as the argument to --construct-dataset. This function\nwill be called as function(n_iter,iter_group), where n_iter is the iteration\nwhose data are being considered and iter_group is the corresponding group\nin the main WEST HDF5 file (west.h5). The function must return data which can\nbe indexed as [segment][timepoint][dimension].\n\nTo use a list of data set specifications, specify --dsspecs and then list the\ndesired datasets one-by-one (space-separated in most shells). These data set\nspecifications are formatted as NAME[,file=FILENAME,slice=SLICE], which will\nuse the dataset called NAME in the HDF5 file FILENAME (defaulting to the main\nWEST HDF5 file west.h5), and slice it with the Python slice expression SLICE\n(as in [0:2] to select the first two elements of the first axis of the\ndataset). The ``slice`` option is most useful for selecting one column (or\nmore) from a multi-column dataset, such as arises when using a progress\ncoordinate of multiple dimensions.\n\n\n-----------------------------------------------------------------------------\nHistogram binning\n-----------------------------------------------------------------------------\n\nBy default, histograms are constructed with 100 bins in each dimension. This\ncan be overridden by specifying -b/--bins, which accepts a number of different\nkinds of arguments:\n\n a single integer N\n N uniformly spaced bins will be used in each dimension.\n\n a sequence of integers N1,N2,... (comma-separated)\n N1 uniformly spaced bins will be used for the first dimension, N2 for the\n second, and so on.\n\n a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]\n The bin boundaries B11, B12, B13, ... will be used for the first dimension,\n B21, B22, B23, ... for the second dimension, and so on. These bin\n boundaries need not be uniformly spaced. These expressions will be\n evaluated with Python\'s ``eval`` construct, with ``np`` available for\n use [e.g. to specify bins using np.arange()].\n\nThe first two forms (integer, list of integers) will trigger a scan of all\ndata in each dimension in order to determine the minimum and maximum values,\nwhich may be very expensive for large datasets. This can be avoided by\nexplicitly providing bin boundaries using the list-of-lists form.\n\nNote that these bins are *NOT* at all related to the bins used to drive WE\nsampling.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file produced (specified by -o/--output, defaulting to "pdist.h5")\nmay be fed to plothist to generate plots (or appropriately processed text or\nHDF5 files) from this data. In short, the following datasets are created:\n\n ``histograms``\n Normalized histograms. The first axis corresponds to iteration, and\n remaining axes correspond to dimensions of the input dataset.\n\n ``/binbounds_0``\n Vector of bin boundaries for the first (index 0) dimension. Additional\n datasets similarly named (/binbounds_1, /binbounds_2, ...) are created\n for additional dimensions.\n\n ``/midpoints_0``\n Vector of bin midpoints for the first (index 0) dimension. Additional\n datasets similarly named are created for additional dimensions.\n\n ``n_iter``\n Vector of iteration numbers corresponding to the stored histograms (i.e.\n the first axis of the ``histograms`` dataset).\n\n\n-----------------------------------------------------------------------------\nSubsequent processing\n-----------------------------------------------------------------------------\n\nThe output generated by this program (-o/--output, default "pdist.h5") may be\nplotted by the ``plothist`` program. See ``plothist --help`` for more\ninformation.\n\n\n-----------------------------------------------------------------------------\nParallelization\n-----------------------------------------------------------------------------\n\nThis tool supports parallelized binning, including reading of input data.\nParallel processing is the default. For simple cases (reading pre-computed\ninput data, modest numbers of segments), serial processing (--serial) may be\nmore efficient.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n\n'

add_args(parser): Add arguments specific to this tool to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)

go(): Perform the analysis associated with this tool.

static parse_binspec(binspec)

construct_bins(bins)

Construct bins according to bins, which may be:

A scalar integer (for that number of bins in each dimension)

A sequence of integers (specifying number of bins for each dimension)

A sequence of sequences of bin boundaries (specifying boundaries for each dimension)

Sets self.binbounds to a list of arrays of bin boundaries appropriate for passing to fasthist.histnd, along with self.midpoints to the midpoints of the bins.

scan_data_shape()

scan_data_range(): Scan input data for range in each dimension. The number of dimensions is determined from the shape of the progress coordinate as of self.iter_start.

construct_histogram(): Construct a histogram using bins previously constructed with construct_bins(). The time series of histogram values is stored in histograms. Each histogram in the time series is normalized.

westpa.cli.tools.w_pdist.entry_point()