w_eddist

usage:

w_eddist [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
               [--max-queue-length MAX_QUEUE_LENGTH] [-b BINEXPR] [-C] [--loose] --istate ISTATE
               --fstate FSTATE [--first-iter ITER_START] [--last-iter ITER_STOP] [-k KINETICS]
               [-o OUTPUT] [--serial | --parallel | --work-manager WORK_MANAGER]
               [--n-workers N_WORKERS] [--zmq-mode MODE] [--zmq-comm-mode COMM_MODE]
               [--zmq-write-host-info INFO_FILE] [--zmq-read-host-info INFO_FILE]
               [--zmq-upstream-rr-endpoint ENDPOINT] [--zmq-upstream-ann-endpoint ENDPOINT]
               [--zmq-downstream-rr-endpoint ENDPOINT] [--zmq-downstream-ann-endpoint ENDPOINT]
               [--zmq-master-heartbeat MASTER_HEARTBEAT] [--zmq-worker-heartbeat WORKER_HEARTBEAT]
               [--zmq-timeout-factor FACTOR] [--zmq-startup-timeout STARTUP_TIMEOUT]
               [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]

Calculate time-resolved transition-event duration distribution from kinetics results

Source data

Source data is collected from the results of ‘w_kinetics trace’ (see w_kinetics trace –help for more information on generating this dataset).

Histogram binning

By default, histograms are constructed with 100 bins in each dimension. This can be overridden by specifying -b/–bins, which accepts a number of different kinds of arguments:

a single integer N
  N uniformly spaced bins will be used in each dimension.

a sequence of integers N1,N2,... (comma-separated)
  N1 uniformly spaced bins will be used for the first dimension, N2 for the
  second, and so on.

a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]
  The bin boundaries B11, B12, B13, ... will be used for the first dimension,
  B21, B22, B23, ... for the second dimension, and so on. These bin
  boundaries need not be uniformly spaced. These expressions will be
  evaluated with Python's ``eval`` construct, with ``np`` available for
  use [e.g. to specify bins using np.arange()].

The first two forms (integer, list of integers) will trigger a scan of all data in each dimension in order to determine the minimum and maximum values, which may be very expensive for large datasets. This can be avoided by explicitly providing bin boundaries using the list-of-lists form.

Note that these bins are NOT at all related to the bins used to drive WE sampling.

Output format

The output file produced (specified by -o/–output, defaulting to “pdist.h5”) may be fed to plothist to generate plots (or appropriately processed text or HDF5 files) from this data. In short, the following datasets are created:

``histograms``
  Normalized histograms. The first axis corresponds to iteration, and
  remaining axes correspond to dimensions of the input dataset.

``/binbounds_0``
  Vector of bin boundaries for the first (index 0) dimension. Additional
  datasets similarly named (/binbounds_1, /binbounds_2, ...) are created
  for additional dimensions.

``/midpoints_0``
  Vector of bin midpoints for the first (index 0) dimension. Additional
  datasets similarly named are created for additional dimensions.

``n_iter``
  Vector of iteration numbers corresponding to the stored histograms (i.e.
  the first axis of the ``histograms`` dataset).

Subsequent processing

The output generated by this program (-o/–output, default “pdist.h5”) may be plotted by the plothist program. See plothist --help for more information.

Parallelization

This tool supports parallelized binning, including reading of input data. Parallel processing is the default. For simple cases (reading pre-computed input data, modest numbers of segments), serial processing (–serial) may be more efficient.

Command-line options

optional arguments:

-h, --help            show this help message and exit
-b BINEXPR, --bins BINEXPR
                      Use BINEXPR for bins. This may be an integer, which will be used for each
                      dimension of the progress coordinate; a list of integers (formatted as
                      [n1,n2,...]) which will use n1 bins for the first dimension, n2 for the second
                      dimension, and so on; or a list of lists of boundaries (formatted as [[a1, a2,
                      ...], [b1, b2, ...], ... ]), which will use [a1, a2, ...] as bin boundaries for
                      the first dimension, [b1, b2, ...] as bin boundaries for the second dimension,
                      and so on. (Default: 100 bins in each dimension.)
-C, --compress        Compress histograms. May make storage of higher-dimensional histograms more
                      tractable, at the (possible extreme) expense of increased analysis time.
                      (Default: no compression.)
--loose               Ignore values that do not fall within bins. (Risky, as this can make buggy bin
                      boundaries appear as reasonable data. Only use if you are sure of your bin
                      boundary specification.)
--istate ISTATE       Initial state defining transition event
--fstate FSTATE       Final state defining transition event

general options:

-r RCFILE, --rcfile RCFILE
                      use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet               emit only essential information
--verbose             emit extra information
--debug               enable extra checks and emit copious information
--version             show program's version number and exit

parallelization options:

--max-queue-length MAX_QUEUE_LENGTH: Maximum number of tasks that can be queued. Useful to limit RAM use for tasks that have very large requests/response. Default: no limit.

iteration range options:

--first-iter ITER_START
                      Iteration to begin analysis (default: 1)
--last-iter ITER_STOP
                      Iteration to end analysis

input/output options:

-k KINETICS, --kinetics KINETICS
                      Populations and transition rates (including evolution) are stored in KINETICS
                      (default: kintrace.h5).
-o OUTPUT, --output OUTPUT
                      Store results in OUTPUT (default: eddist.h5).

parallelization options:

--serial: run in serial mode
--parallel: run in parallel mode (using processes)
--work-manager WORK_MANAGER: use the given work manager for parallel task distribution. Available work managers are (‘serial’, ‘threads’, ‘processes’, ‘zmq’); default is ‘processes’
--n-workers N_WORKERS: Use up to N_WORKERS on this host, for work managers which support this option. Use 0 for a dedicated server. (Ignored by work managers which do not support this option.)

options for ZeroMQ (“zmq”) work manager (master or node):

--zmq-mode MODE       Operate as a master (server) or a node (workers/client). "server" is a
                      deprecated synonym for "master" and "client" is a deprecated synonym for
                      "node".
--zmq-comm-mode COMM_MODE
                      Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
                      communication within a node. IPC (the default) may be more efficient but is not
                      available on (exceptionally rare) systems without node-local storage (e.g.
                      /tmp); on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
                      Store hostname and port information needed to connect to this instance in
                      INFO_FILE. This allows the master and nodes assisting in coordinating the
                      communication of other nodes to choose ports randomly. Downstream nodes read
                      this file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
                      Read hostname and port information needed to connect to the master (or other
                      coordinating node) from INFO_FILE. This allows the master and nodes assisting
                      in coordinating the communication of other nodes to choose ports randomly,
                      writing that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
                      ZeroMQ endpoint to which to send request/response (task and result) traffic
                      toward the master.
--zmq-upstream-ann-endpoint ENDPOINT
                      ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
                      notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
                      ZeroMQ endpoint on which to listen for request/response (task and result)
                      traffic from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
                      ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
                      notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
                      Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
                      Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
                      Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker
                      in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
                      doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
                      assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
                      Amount of time (in seconds) to wait for communication between the master and at
                      least one worker. This may need to be changed on very large, heavily-loaded
                      computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
                      Amount of time (in seconds) to wait for workers to shut down.

westpa.cli.tools.w_eddist module

class westpa.cli.tools.w_eddist.WESTParallelTool(wm_env=None)

Bases: WESTTool

Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.

make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None): A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.

add_args(parser): Add arguments specific to this tool to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)

go(): Perform the analysis associated with this tool.

main(): A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.

class westpa.cli.tools.w_eddist.ProgressIndicatorComponent

Bases: WESTToolComponent

add_args(parser): Add arguments specific to this component to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

westpa.cli.tools.w_eddist.histnd(values, binbounds, weights=1.0, out=None, binbound_check=True, ignore_out_of_range=False): Generate an N-dimensional PDF (or contribution to a PDF) from the given values. binbounds is a list of arrays of boundary values, with one entry for each dimension (values must have as many columns as there are entries in binbounds) weight, if provided, specifies the weight each value contributes to the histogram; this may be a scalar (for equal weights for all values) or a vector of the same length as values (for unequal weights). If binbound_check is True, then the boundaries are checked for strict positive monotonicity; set to False to shave a few microseconds if you know your bin boundaries to be monotonically increasing.

westpa.cli.tools.w_eddist.normhistnd(hist, binbounds): Normalize the N-dimensional histogram hist with corresponding bin boundaries binbounds. Modifies hist in place and returns the normalization factor used.

class westpa.cli.tools.w_eddist.DurationDataset(dataset, mask, iter_start=1)

Bases: object

A facade for the ‘dsspec’ dataclass that incorporates the mask into get_iter_data method

get_iter_data(n_iter)

westpa.cli.tools.w_eddist.isiterable(x)

class westpa.cli.tools.w_eddist.WEDDist

Bases: WESTParallelTool

prog = 'w_eddist'

description = 'Calculate time-resolved transition-event duration distribution from kinetics results\n\n\n-----------------------------------------------------------------------------\nSource data\n-----------------------------------------------------------------------------\n\nSource data is collected from the results of \'w_kinetics trace\' (see w_kinetics trace --help for\nmore information on generating this dataset).\n\n\n-----------------------------------------------------------------------------\nHistogram binning\n-----------------------------------------------------------------------------\n\nBy default, histograms are constructed with 100 bins in each dimension. This\ncan be overridden by specifying -b/--bins, which accepts a number of different\nkinds of arguments:\n\n a single integer N\n N uniformly spaced bins will be used in each dimension.\n\n a sequence of integers N1,N2,... (comma-separated)\n N1 uniformly spaced bins will be used for the first dimension, N2 for the\n second, and so on.\n\n a list of lists [[B11, B12, B13, ...], [B21, B22, B23, ...], ...]\n The bin boundaries B11, B12, B13, ... will be used for the first dimension,\n B21, B22, B23, ... for the second dimension, and so on. These bin\n boundaries need not be uniformly spaced. These expressions will be\n evaluated with Python\'s ``eval`` construct, with ``np`` available for\n use [e.g. to specify bins using np.arange()].\n\nThe first two forms (integer, list of integers) will trigger a scan of all\ndata in each dimension in order to determine the minimum and maximum values,\nwhich may be very expensive for large datasets. This can be avoided by\nexplicitly providing bin boundaries using the list-of-lists form.\n\nNote that these bins are *NOT* at all related to the bins used to drive WE\nsampling.\n\n\n-----------------------------------------------------------------------------\nOutput format\n-----------------------------------------------------------------------------\n\nThe output file produced (specified by -o/--output, defaulting to "pdist.h5")\nmay be fed to plothist to generate plots (or appropriately processed text or\nHDF5 files) from this data. In short, the following datasets are created:\n\n ``histograms``\n Normalized histograms. The first axis corresponds to iteration, and\n remaining axes correspond to dimensions of the input dataset.\n\n ``/binbounds_0``\n Vector of bin boundaries for the first (index 0) dimension. Additional\n datasets similarly named (/binbounds_1, /binbounds_2, ...) are created\n for additional dimensions.\n\n ``/midpoints_0``\n Vector of bin midpoints for the first (index 0) dimension. Additional\n datasets similarly named are created for additional dimensions.\n\n ``n_iter``\n Vector of iteration numbers corresponding to the stored histograms (i.e.\n the first axis of the ``histograms`` dataset).\n\n\n-----------------------------------------------------------------------------\nSubsequent processing\n-----------------------------------------------------------------------------\n\nThe output generated by this program (-o/--output, default "pdist.h5") may be\nplotted by the ``plothist`` program. See ``plothist --help`` for more\ninformation.\n\n\n-----------------------------------------------------------------------------\nParallelization\n-----------------------------------------------------------------------------\n\nThis tool supports parallelized binning, including reading of input data.\nParallel processing is the default. For simple cases (reading pre-computed\ninput data, modest numbers of segments), serial processing (--serial) may be\nmore efficient.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n\n'

add_args(parser): Add arguments specific to this tool to the given argparse parser.

process_args(args): Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)

go(): Perform the analysis associated with this tool.

static parse_binspec(binspec)

construct_bins(bins)

Construct bins according to bins, which may be:

A scalar integer (for that number of bins in each dimension)

A sequence of integers (specifying number of bins for each dimension)

A sequence of sequences of bin boundaries (specifying boundaries for each dimension)

Sets self.binbounds to a list of arrays of bin boundaries appropriate for passing to fasthist.histnd, along with self.midpoints to the midpoints of the bins.

scan_data_shape()

scan_data_range(): Scan input data for range in each dimension. The number of dimensions is determined from the shape of the progress coordinate as of self.iter_start.

construct_histogram(): Construct a histogram using bins previously constructed with construct_bins(). The time series of histogram values is stored in histograms. Each histogram in the time series is normalized.

westpa.cli.tools.w_eddist.entry_point()