westpa.analysis package

This subpackage provides an API to facilitate the analysis of WESTPA simulation data. Its core abstraction is the Run class. A Run instance provides a read-only view of a WEST HDF5 (“west.h5”) file.

API reference: https://westpa.readthedocs.io/en/latest/documentation/analysis/

How To

Open a run:

>>> from westpa.analysis import Run
>>> run = Run.open('west.h5')
>>> run
<WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>

Iterate over iterations and walkers:

>>> for iteration in run:
...     for walker in iteration:
...         pass
...

Access a particular iteration:

>>> iteration = run.iteration(10)
>>> iteration
Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))

Access a particular walker:

>>> walker = iteration.walker(4)
>>> walker
Walker(4, Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))

Get the weight and progress coordinate values of a walker:

>>> walker.weight
9.876543209876543e-06
>>> walker.pcoords
array([[3.1283207],
       [3.073721 ],
       [2.959221 ],
       [2.6756208],
       [2.7888207]], dtype=float32)

Get the parent and children of a walker:

>>> walker.parent
Walker(2, Iteration(9, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
>>> for child in walker.children:
...     print(child)
...
Walker(0, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(1, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(2, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(3, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(4, Iteration(11, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))

Trace the ancestry of a walker:

>>> trace = walker.trace()
>>> trace
Trace(Walker(4, Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>)))
>>> for walker in trace:
...     print(walker)
...
Walker(1, Iteration(1, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(4, Iteration(2, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(5, Iteration(3, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(6, Iteration(4, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(9, Iteration(5, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(8, Iteration(6, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(8, Iteration(7, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(13, Iteration(8, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(2, Iteration(9, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))
Walker(4, Iteration(10, <WESTPA Run with 500 iterations at 0x7fcaf8f0d5b0>))

Close a run (and its underlying HDF5 file):

>>> run.close()
>>> run
<Closed WESTPA Run at 0x7fcaf8f0d5b0>
>>> run.h5file
<Closed HDF5 file>

Retrieving Trajectories

Built-in Reader

MD trajectory data stored in an identical manner as in the Basic NaCl tutorial may be retrieved using the built-in BasicMDTrajectory reader with its default settings:

>>> from westpa.analysis import BasicMDTrajectory
>>> trajectory = BasicMDTrajectory()

Here trajectory is a callable object that takes either a Walker or a Trace instance as input and returns an MDTraj Trajectory:

>>> traj = trajectory(walker)
>>> traj
<mdtraj.Trajectory with 5 frames, 33001 atoms, 6625 residues, and unitcells at 0x7fcae484ad00>
>>> traj = trajectory(trace)
>>> traj
<mdtraj.Trajectory with 41 frames, 33001 atoms, 6625 residues, and unitcells at 0x7fcae487c790>

Minor variations of the “basic” trajectory storage protocol (e.g., use of different file formats) can be handled by changing the parameters of the BasicMDTrajectory reader. For example, suppose that instead of storing the coordinate and topology data for trajectory segments in separate files (“seg.dcd” and “bstate.pdb”), we store them together in a MDTraj HDF5 trajectory file (“seg.h5”). This change can be accommodated by explicitly setting the traj_ext and top parameters of the trajectory reader:

>>> trajectory = BasicMDTrajectory(traj_ext='.h5', top=None)

Trajectories that are saved with the HDF5 Framework can use HDF5MDTrajectory reader instead.

Custom Readers

For users requiring greater flexibility, custom trajectory readers can be implemented using the westpa.analysis.Trajectory class. Implementing a custom reader requires two ingredients:

  1. A function for retrieving individual trajectory segments. The function must take a Walker instance as its first argument and return a sequence (e.g., a list, NumPy array, or MDTraj Trajectory) representing the trajectory of the walker. Moreover, it must accept a Boolean keyword argument include_initpoint, which specifies whether the returned trajectory includes its initial point.

  2. A function for concatenating trajectory segments. A default implementation is provided by the concatenate() function in the westpa.analysis.trajectories module.

westpa.analysis.core module

class westpa.analysis.core.Run(h5filename='west.h5')

A read-only view of a WESTPA simulation run.

Parameters:

h5filename (str or file-like object, default 'west.h5') – Pathname or stream of a main WESTPA HDF5 data file.

classmethod open(h5filename='west.h5')

Alternate constructor.

Parameters:

h5filename (str or file-like object, default 'west.h5') – Pathname or stream of a main WESTPA HDF5 data file.

close()

Close the Run instance by closing the underlying WESTPA HDF5 file.

property closed

Whether the Run instance is closed.

Type:

bool

property summary

Summary data by iteration.

Type:

pd.DataFrame

property num_iterations

Number of completed iterations.

Type:

int

property iterations

Sequence of iterations.

Type:

Sequence[Iteration]

property num_walkers

Total number of walkers.

Type:

int

property num_segments

Total number of trajectory segments (alias self.num_walkers).

Type:

int

property walkers

All walkers in the run.

Type:

Iterable[Walker]

property recycled_walkers

Walkers that stopped in the sink.

Type:

Iterable[Walker]

property initial_walkers

Walkers whose parents are initial states.

Type:

Iterable[Walker]

iteration(number)

Return a specific iteration.

Parameters:

number (int) – Iteration number (1-based).

Returns:

The iteration indexed by number.

Return type:

Iteration

class westpa.analysis.core.Iteration(number, run)

An iteration of a WESTPA simulation.

Parameters:
  • number (int) – Iteration number (1-based).

  • run (Run) – Simulation run to which the iteration belongs.

property h5group

HDF5 group containing the iteration data.

Type:

h5py.Group

property prev

Previous iteration.

Type:

Iteration

property next

Next iteration.

Type:

Iteration

property summary

Iteration summary.

Type:

pd.DataFrame

property segment_summaries

Segment summary data for the iteration.

Type:

pd.DataFrame

property pcoords

Progress coordinate snaphots of each walker.

Type:

3D ndarray

property weights

Statistical weight of each walker.

Type:

1D ndarray

property bin_target_counts

Target count for each bin.

Type:

1D ndarray, dtype=uint64

property bin_mapper

Bin mapper used in the iteration.

Type:

BinMapper

property num_bins

Number of bins.

Type:

int

property bins

Bins.

Type:

Iterable[Bin]

property num_walkers

Number of walkers in the iteration.

Type:

int

property num_segments

Number of trajectory segments (alias self.num_walkers).

Type:

int

property walkers

Walkers in the iteration.

Type:

Iterable[Walker]

property recycled_walkers

Walkers that stopped in the sink.

Type:

Iterable[Walker]

property initial_walkers

Walkers whose parents are initial states.

Type:

Iterable[Walker]

property auxiliary_data

Auxiliary data stored for the iteration.

Type:

h5py.Group or None

property basis_state_summaries

Basis state summary data.

Type:

pd.DataFrame

property basis_state_pcoords

Progress coordinates of each basis state.

Type:

2D ndarray

property basis_states

Basis states in use for the iteration.

Type:

list[BasisState]

property has_target_states

Whether target (sink) states are defined for this iteration.

Type:

bool

property target_state_summaries

Target state summary data.

Type:

pd.DataFrame or None

property target_state_pcoords

Progress coordinates of each target state.

Type:

2D ndarray or None

property target_states

Target states in use for the iteration.

Type:

list[TargetState]

property sink

Union of bins serving as the recycling sink.

Type:

BinUnion or None

bin(index)

Return the bin with the given index.

Parameters:

index (int) – Bin index (0-based).

Returns:

The bin indexed by index.

Return type:

Bin

walker(index)

Return the walker with the given index.

Parameters:

index (int) – Walker index (0-based).

Returns:

The walker indexed by index.

Return type:

Walker

basis_state(index)

Return the basis state with the given index.

Parameters:

index (int) – Basis state index (0-based).

Returns:

The basis state indexed by index.

Return type:

BasisState

target_state(index)

Return the target state with the given index.

Parameters:

index (int) – Target state index (0-based).

Returns:

The target state indexed by index.

Return type:

TargetState

class westpa.analysis.core.Walker(index, iteration)

A walker in an iteration of a WESTPA simulation.

Parameters:
  • index (int) – Walker index (0-based).

  • iteration (Iteration) – Iteration to which the walker belongs.

property run

Run to which the walker belongs.

Type:

Run

property weight

Statistical weight of the walker.

Type:

float64

property pcoords

Progress coordinate snapshots.

Type:

2D ndarray

property num_snapshots

Number of snapshots.

Type:

int

property segment_summary

Segment summary data.

Type:

pd.Series

property parent

The parent of the walker.

Type:

Walker or InitialState

property children

The children of the walker.

Type:

Iterable[Walker]

property recycled

True if the walker stopped in the sink, False otherwise.

Type:

bool

property initial

True if the parent of the walker is an initial state, False otherwise.

Type:

bool

property auxiliary_data

Auxiliary data for the walker.

Type:

dict

trace(**kwargs)

Return the trace (ancestral line) of the walker.

For full documentation see Trace.

Returns:

The trace of the walker.

Return type:

Trace

class westpa.analysis.core.BinUnion(indices, mapper)

A (disjoint) union of bins defined by a common bin mapper.

Parameters:
  • indices (iterable of int) – The indices of the bins comprising the union.

  • mapper (BinMapper) – The bin mapper defining the bins.

union(*others)

Return the union of the bin union and all others.

Parameters:

*others (BinUnion) – Other BinUnion instances, consisting of bins defined by the same underlying bin mapper.

Returns:

The union of self and others.

Return type:

BinUnion

intersection(*others)

Return the intersection of the bin union and all others.

Parameters:

*others (BinUnion) – Other BinUnion instances, consisting of bins defined by the same underlying bin mapper.

Returns:

The itersection of self and others.

Return type:

BinUnion

class westpa.analysis.core.Bin(index, mapper)

A bin defined by a bin mapper.

Parameters:
  • index (int) – The index of the bin.

  • mapper (BinMapper) – The bin mapper defining the bin.

class westpa.analysis.core.Trace(walker, source=None, max_length=None)

A trace of a walker’s ancestry.

Parameters:
  • walker (Walker) – The terminal walker.

  • source (Bin, BinUnion, or collections.abc.Container, optional) – A source (macro)state, specified as a container object whose __contains__() method is the indicator function for the corresponding subset of progress coordinate space. The trace is stopped upon encountering a walker that stopped in source.

  • max_length (int, optional) – The maximum number of walkers in the trace.

westpa.analysis.trajectories module

class westpa.analysis.trajectories.Trajectory(fget=None, *, fconcat=None)

A callable that returns the trajectory of a walker or trace.

Parameters:
  • fget (callable) – Function for retrieving a single trajectory segment. Must take a Walker instance as its first argument and accept a boolean keyword argument include_initpoint. The function should return a sequence (e.g., a list or ndarray) representing the trajectory of the walker. If include_initpoint is True, the trajectory segment should include its initial point. Otherwise, the trajectory segment should exclude its initial point.

  • fconcat (callable, optional) – Function for concatenating trajectory segments. Must take a sequence of trajectory segments as input and return their concatenation. The default concatenation function is concatenate().

property segment_collector

Segment retrieval manager.

Type:

SegmentCollector

property fget

Function for getting trajectory segments.

Type:

callable

property fconcat

Function for concatenating trajectory segments.

Type:

callable

class westpa.analysis.trajectories.SegmentCollector(trajectory, use_threads=False, max_workers=None, show_progress=False)

An object that manages the retrieval of trajectory segments.

Parameters:
  • trajectory (Trajectory) – The trajectory to which the segment collector is attached.

  • use_threads (bool, default False) – Whether to use a pool of threads to retrieve trajectory segments asynchronously. Setting this parameter to True may be may be useful when segment retrieval is an I/O bound task.

  • max_workers (int, optional) – Maximum number of threads to use. The default value is specified in the ThreadPoolExecutor documentation.

  • show_progress (bool, default False) – Whether to show a progress bar when retrieving multiple segments.

get_segments(walkers, initpoint_mask=None, **kwargs)

Retrieve the trajectories of multiple walkers.

Parameters:
  • walkers (sequence of Walker) – The walkers for which to retrieve trajectories.

  • initpoint_mask (sequence of bool, optional) – A Boolean mask indicating whether each trajectory segment should include (True) or exclude (False) its initial point. Default is all True.

Returns:

The trajectory of each walker.

Return type:

list of sequences

class westpa.analysis.trajectories.BasicMDTrajectory(top='bstate.pdb', traj_ext='.dcd', state_ext='.xml', sim_root='.')

Trajectory reader for MD trajectories stored as in the Basic Tutorial.

Parameters:
  • top (str or mdtraj.Topology, default 'bstate.pdb')

  • traj_ext (str, default '.dcd')

  • state_ext (str, default '.xml')

  • sim_root (str, default '.')

class westpa.analysis.trajectories.HDF5MDTrajectory

Trajectory reader for MD trajectories stored by the HDF5 framework.

westpa.analysis.trajectories.concatenate(segments)

Return the concatenation of a sequence of trajectory segments.

Parameters:

segments (sequence of sequences) – A sequence of trajectory segments.

Returns:

The concatenation of segments.

Return type:

sequence

westpa.analysis.statistics module

westpa.analysis.statistics.time_average(observable, iterations)

Compute the time average of an observable.

Parameters:
  • observable (Callable[[Walker], ArrayLike]) – Function that takes a walker as input and returns a number or a fixed-size array of numbers.

  • iterations (Sequence[Iteration]) – Sequence of iterations over which to compute the average.

Returns:

The time average of observable over iterations.

Return type:

ArrayLike