w_crawl

usage:

w_crawl [-h] [-r RCFILE] [--quiet | --verbose | --debug] [--version]
              [--max-queue-length MAX_QUEUE_LENGTH] [-W WEST_H5FILE] [--first-iter N_ITER]
              [--last-iter N_ITER] [-c CRAWLER_INSTANCE]
              [--serial | --parallel | --work-manager WORK_MANAGER] [--n-workers N_WORKERS]
              [--zmq-mode MODE] [--zmq-comm-mode COMM_MODE] [--zmq-write-host-info INFO_FILE]
              [--zmq-read-host-info INFO_FILE] [--zmq-upstream-rr-endpoint ENDPOINT]
              [--zmq-upstream-ann-endpoint ENDPOINT] [--zmq-downstream-rr-endpoint ENDPOINT]
              [--zmq-downstream-ann-endpoint ENDPOINT] [--zmq-master-heartbeat MASTER_HEARTBEAT]
              [--zmq-worker-heartbeat WORKER_HEARTBEAT] [--zmq-timeout-factor FACTOR]
              [--zmq-startup-timeout STARTUP_TIMEOUT] [--zmq-shutdown-timeout SHUTDOWN_TIMEOUT]
              task_callable

Crawl a weighted ensemble dataset, executing a function for each iteration. This can be used for postprocessing of trajectories, cleanup of datasets, or anything else that can be expressed as “do X for iteration N, then do something with the result”. Tasks are parallelized by iteration, and no guarantees are made about evaluation order.

Command-line options

optional arguments:

-h, --help            show this help message and exit

general options:

-r RCFILE, --rcfile RCFILE
                      use RCFILE as the WEST run-time configuration file (default: west.cfg)
--quiet               emit only essential information
--verbose             emit extra information
--debug               enable extra checks and emit copious information
--version             show program's version number and exit

parallelization options:

--max-queue-length MAX_QUEUE_LENGTH
                      Maximum number of tasks that can be queued. Useful to limit RAM use for tasks
                      that have very large requests/response. Default: no limit.

WEST input data options:

-W WEST_H5FILE, --west-data WEST_H5FILE
                      Take WEST data from WEST_H5FILE (default: read from the HDF5 file specified in
                      west.cfg).

iteration range:

--first-iter N_ITER   Begin analysis at iteration N_ITER (default: 1).
--last-iter N_ITER    Conclude analysis with N_ITER, inclusive (default: last completed iteration).

task options:

-c CRAWLER_INSTANCE, --crawler-instance CRAWLER_INSTANCE
                      Use CRAWLER_INSTANCE (specified as module.instance) as an instance of
                      WESTPACrawler to coordinate the calculation. Required only if initialization,
                      finalization, or task result processing is required.
task_callable         Run TASK_CALLABLE (specified as module.function) on each iteration. Required.

parallelization options:

--serial              run in serial mode
--parallel            run in parallel mode (using processes)
--work-manager WORK_MANAGER
                      use the given work manager for parallel task distribution. Available work
                      managers are ('serial', 'threads', 'processes', 'zmq'); default is 'serial'
--n-workers N_WORKERS
                      Use up to N_WORKERS on this host, for work managers which support this option.
                      Use 0 for a dedicated server. (Ignored by work managers which do not support
                      this option.)

options for ZeroMQ (“zmq”) work manager (master or node):

--zmq-mode MODE       Operate as a master (server) or a node (workers/client). "server" is a
                      deprecated synonym for "master" and "client" is a deprecated synonym for
                      "node".
--zmq-comm-mode COMM_MODE
                      Use the given communication mode -- TCP or IPC (Unix-domain) -- sockets for
                      communication within a node. IPC (the default) may be more efficient but is not
                      available on (exceptionally rare) systems without node-local storage (e.g.
                      /tmp); on such systems, TCP may be used instead.
--zmq-write-host-info INFO_FILE
                      Store hostname and port information needed to connect to this instance in
                      INFO_FILE. This allows the master and nodes assisting in coordinating the
                      communication of other nodes to choose ports randomly. Downstream nodes read
                      this file with --zmq-read-host-info and know where how to connect.
--zmq-read-host-info INFO_FILE
                      Read hostname and port information needed to connect to the master (or other
                      coordinating node) from INFO_FILE. This allows the master and nodes assisting
                      in coordinating the communication of other nodes to choose ports randomly,
                      writing that information with --zmq-write-host-info for this instance to read.
--zmq-upstream-rr-endpoint ENDPOINT
                      ZeroMQ endpoint to which to send request/response (task and result) traffic
                      toward the master.
--zmq-upstream-ann-endpoint ENDPOINT
                      ZeroMQ endpoint on which to receive announcement (heartbeat and shutdown
                      notification) traffic from the master.
--zmq-downstream-rr-endpoint ENDPOINT
                      ZeroMQ endpoint on which to listen for request/response (task and result)
                      traffic from subsidiary workers.
--zmq-downstream-ann-endpoint ENDPOINT
                      ZeroMQ endpoint on which to send announcement (heartbeat and shutdown
                      notification) traffic toward workers.
--zmq-master-heartbeat MASTER_HEARTBEAT
                      Every MASTER_HEARTBEAT seconds, the master announces its presence to workers.
--zmq-worker-heartbeat WORKER_HEARTBEAT
                      Every WORKER_HEARTBEAT seconds, workers announce their presence to the master.
--zmq-timeout-factor FACTOR
                      Scaling factor for heartbeat timeouts. If the master doesn't hear from a worker
                      in WORKER_HEARTBEAT*FACTOR, the worker is assumed to have crashed. If a worker
                      doesn't hear from the master in MASTER_HEARTBEAT*FACTOR seconds, the master is
                      assumed to have crashed. Both cases result in shutdown.
--zmq-startup-timeout STARTUP_TIMEOUT
                      Amount of time (in seconds) to wait for communication between the master and at
                      least one worker. This may need to be changed on very large, heavily-loaded
                      computer systems that start all processes simultaneously.
--zmq-shutdown-timeout SHUTDOWN_TIMEOUT
                      Amount of time (in seconds) to wait for workers to shut down.

westpa.cli.tools.w_crawl module

class westpa.cli.tools.w_crawl.WESTParallelTool(wm_env=None)

Bases: WESTTool

Base class for command-line tools parallelized with wwmgr. This automatically adds and processes wwmgr command-line arguments and creates a work manager at self.work_manager.

make_parser_and_process(prog=None, usage=None, description=None, epilog=None, args=None)

A convenience function to create a parser, call add_all_args(), and then call process_all_args(). The argument namespace is returned.

add_args(parser)

Add arguments specific to this tool to the given argparse parser.

process_args(args)

Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)

go()

Perform the analysis associated with this tool.

main()

A convenience function to make a parser, parse and process arguments, then run self.go() in the master process.

class westpa.cli.tools.w_crawl.WESTDataReader

Bases: WESTToolComponent

Tool for reading data from WEST-related HDF5 files. Coordinates finding the main HDF5 file from west.cfg or command line arguments, caching of certain kinds of data (eventually), and retrieving auxiliary data sets from various places.

add_args(parser)

Add arguments specific to this component to the given argparse parser.

process_args(args)

Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

open(mode='r')
close()
property weight_dsspec
property parent_id_dsspec
class westpa.cli.tools.w_crawl.IterRangeSelection(data_manager=None)

Bases: WESTToolComponent

Select and record limits on iterations used in analysis and/or reporting. This class provides both the user-facing command-line options and parsing, and the application-side API for recording limits in HDF5.

HDF5 datasets calculated based on a restricted set of iterations should be tagged with the following attributes:

first_iter

The first iteration included in the calculation.

last_iter

One past the last iteration included in the calculation.

iter_step

Blocking or sampling period for iterations included in the calculation.

add_args(parser)

Add arguments specific to this component to the given argparse parser.

process_args(args, override_iter_start=None, override_iter_stop=None, default_iter_step=1)

Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

iter_block_iter()

Return an iterable of (block_start,block_end) over the blocks of iterations selected by –first-iter/–last-iter/–step-iter.

n_iter_blocks()

Return the number of blocks of iterations (as returned by iter_block_iter) selected by –first-iter/–last-iter/–step-iter.

record_data_iter_range(h5object, iter_start=None, iter_stop=None)

Store attributes iter_start and iter_stop on the given HDF5 object (group/dataset)

record_data_iter_step(h5object, iter_step=None)

Store attribute iter_step on the given HDF5 object (group/dataset).

check_data_iter_range_least(h5object, iter_start=None, iter_stop=None)

Check that the given HDF5 object contains (as denoted by its iter_start/iter_stop attributes) data at least for the iteration range specified.

check_data_iter_range_equal(h5object, iter_start=None, iter_stop=None)

Check that the given HDF5 object contains (as denoted by its iter_start/iter_stop attributes) data exactly for the iteration range specified.

check_data_iter_step_conformant(h5object, iter_step=None)

Check that the given HDF5 object contains per-iteration data at an iteration stride suitable for extracting data with the given stride (in other words, the given iter_step is a multiple of the stride with which data was recorded).

check_data_iter_step_equal(h5object, iter_step=None)

Check that the given HDF5 object contains per-iteration data at an iteration stride the same as that specified.

slice_per_iter_data(dataset, iter_start=None, iter_stop=None, iter_step=None, axis=0)

Return the subset of the given dataset corresponding to the given iteration range and stride. Unless otherwise specified, the first dimension of the dataset is the one sliced.

iter_range(iter_start=None, iter_stop=None, iter_step=None, dtype=None)

Return a sequence for the given iteration numbers and stride, filling in missing values from those stored on self. The smallest data type capable of holding iter_stop is returned unless otherwise specified using the dtype argument.

class westpa.cli.tools.w_crawl.ProgressIndicatorComponent

Bases: WESTToolComponent

add_args(parser)

Add arguments specific to this component to the given argparse parser.

process_args(args)

Take argparse-processed arguments associated with this component and deal with them appropriately (setting instance variables, etc)

westpa.cli.tools.w_crawl.get_object(object_name, path=None)

Attempt to load the given object, using additional path information if given.

class westpa.cli.tools.w_crawl.WESTPACrawler

Bases: object

Base class for general crawling execution. This class only exists on the master.

initialize(iter_start, iter_stop)

Initialize this crawling process.

finalize()

Finalize this crawling process.

process_iter_result(n_iter, result)

Process the result of a per-iteration task.

class westpa.cli.tools.w_crawl.WCrawl

Bases: WESTParallelTool

prog = 'w_crawl'
description = 'Crawl a weighted ensemble dataset, executing a function for each iteration.\nThis can be used for postprocessing of trajectories, cleanup of datasets,\nor anything else that can be expressed as "do X for iteration N, then do\nsomething with the result". Tasks are parallelized by iteration, and\nno guarantees are made about evaluation order.\n\n\n-----------------------------------------------------------------------------\nCommand-line options\n-----------------------------------------------------------------------------\n\n'
add_args(parser)

Add arguments specific to this tool to the given argparse parser.

process_args(args)

Take argparse-processed arguments associated with this tool and deal with them appropriately (setting instance variables, etc)

go()

Perform the analysis associated with this tool.

westpa.cli.tools.w_crawl.entry_point()