westpa.core modules

westpa.core module

westpa.core.data_manager module

HDF5 data manager for WEST.

Original HDF5 implementation: Joseph W. Kaus Current implementation: Matthew C. Zwier

WEST exclusively uses the cross-platform, self-describing file format HDF5 for data storage. This ensures that data is stored efficiently and portably in a manner that is relatively straightforward for other analysis tools (perhaps written in C/C++/Fortran) to access.

The data is laid out in HDF5 as follows:
  • summary – overall summary data for the simulation

  • /iterations/ – data for individual iterations, one group per iteration under /iterations
    • iter_00000001/ – data for iteration 1
      • seg_index – overall information about segments in the iteration, including weight

      • pcoord – progress coordinate data organized as [seg_id][time][dimension]

      • wtg_parents – data used to reconstruct the split/merge history of trajectories

      • recycling – flux and event count for recycled particles, on a per-target-state basis

      • auxdata/ – auxiliary datasets (data stored on the ‘data’ field of Segment objects)

The file root object has an integer attribute ‘west_file_format_version’ which can be used to determine how to access data even as the file format (i.e. organization of data within HDF5 file) evolves.

Version history:
Version 9
  • Basis states are now saved as iter_segid instead of just segid as a pointer label.

  • Initial states are also saved in the iteration 0 file, with a negative sign.

Version 8
  • Added external links to trajectory files in iterations/iter_* groups, if the HDF5 framework was used.

  • Added an iter group for the iteration 0 to store conformations of basis states.

Version 7
  • Removed bin_assignments, bin_populations, and bin_rates from iteration group.

  • Added new_segments subgroup to iteration group

Version 6
  • ???

Version 5
  • moved iter_* groups into a top-level iterations/ group,

  • added in-HDF5 storage for basis states, target states, and generated states

class westpa.core.data_manager.attrgetter(attr, /, *attrs)

Bases: object

Return a callable object that fetches the given attribute(s) from its operand. After f = attrgetter(‘name’), the call f(r) returns r.name. After g = attrgetter(‘name’, ‘date’), the call g(r) returns (r.name, r.date). After h = attrgetter(‘name.first’, ‘name.last’), the call h(r) returns (r.name.first, r.name.last).

westpa.core.data_manager.relpath(path, start=None)

Return a relative version of a path

westpa.core.data_manager.dirname(p)

Returns the directory component of a pathname

class westpa.core.data_manager.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)

Bases: object

A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)

SEG_ENDPOINT_CONTINUES = 1
SEG_ENDPOINT_MERGED = 2
SEG_ENDPOINT_RECYCLED = 3
SEG_ENDPOINT_UNSET = 0
SEG_INITPOINT_CONTINUES = 1
SEG_INITPOINT_NEWTRAJ = 2
SEG_INITPOINT_UNSET = 0
SEG_STATUS_COMPLETE = 2
SEG_STATUS_FAILED = 3
SEG_STATUS_PREPARED = 1
SEG_STATUS_UNSET = 0
endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
property endpoint_type_text
endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
static final_pcoord(segment)

Return the final progress coordinate point of this segment.

static initial_pcoord(segment)

Return the initial progress coordinate point of this segment.

property initial_state_id
property initpoint_type
initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
property status_text
statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
class westpa.core.data_manager.BasisState(label, probability, pcoord=None, auxref=None, state_id=None)

Bases: object

Describes an basis (micro)state. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation (i.e. at w_init) or due to recycling.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • label – A descriptive label for this microstate (may be empty)

  • probability – Probability of this state to be selected when creating a new trajectory.

  • pcoord – The representative progress coordinate of this state.

  • auxref – A user-provided (string) reference for locating data associated with this state (usually a filesystem path).

as_numpy_record()

Return the data for this state as a numpy record array.

classmethod states_from_file(statefile)

Read a file defining basis states. Each line defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in:

unbound    1.0

or:

unbound_0    0.6        state0.pdb
unbound_1    0.4        state1.pdb
classmethod states_to_file(states, fileobj)

Write a file defining basis states, which may then be read by states_from_file().

class westpa.core.data_manager.TargetState(label, pcoord, state_id=None)

Bases: object

Describes a target state.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • label – A descriptive label for this microstate (may be empty)

  • pcoord – The representative progress coordinate of this state.

classmethod states_from_file(statefile, dtype)

Read a file defining target states. Each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in:

bound     0.02

for a single target and one-dimensional progress coordinates or:

bound    2.7    0.0
drift    100    50.0

for two targets and a two-dimensional progress coordinate.

classmethod states_to_file(states, fileobj)

Write a file defining basis states, which may then be read by states_from_file().

class westpa.core.data_manager.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)

Bases: object

Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • basis_state_id – Identifier of the basis state from which this state was generated, or None.

  • basis_state – The BasisState from which this state was generated, or None.

  • iter_created – Iteration in which this state was generated (0 for simulation initialization).

  • iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).

  • istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).

  • istate_status – Integer describing whether this initial state has been properly prepared.

  • pcoord – The representative progress coordinate of this state.

ISTATE_STATUS_FAILED = 2
ISTATE_STATUS_PENDING = 0
ISTATE_STATUS_PREPARED = 1
ISTATE_TYPE_BASIS = 1
ISTATE_TYPE_GENERATED = 2
ISTATE_TYPE_RESTART = 3
ISTATE_TYPE_START = 4
ISTATE_TYPE_UNSET = 0
ISTATE_UNUSED = 0
as_numpy_record()
istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
class westpa.core.data_manager.NewWeightEntry(source_type, weight, prev_seg_id=None, prev_init_pcoord=None, prev_final_pcoord=None, new_init_pcoord=None, target_state_id=None, initial_state_id=None)

Bases: object

NW_SOURCE_RECYCLED = 0
class westpa.core.data_manager.ExecutablePropagator(rc=None)

Bases: WESTPropagator

ENV_BSTATE_DATA_REF = 'WEST_BSTATE_DATA_REF'
ENV_BSTATE_ID = 'WEST_BSTATE_ID'
ENV_CURRENT_ITER = 'WEST_CURRENT_ITER'
ENV_CURRENT_SEG_DATA_REF = 'WEST_CURRENT_SEG_DATA_REF'
ENV_CURRENT_SEG_ID = 'WEST_CURRENT_SEG_ID'
ENV_CURRENT_SEG_INITPOINT = 'WEST_CURRENT_SEG_INITPOINT_TYPE'
ENV_ISTATE_DATA_REF = 'WEST_ISTATE_DATA_REF'
ENV_ISTATE_ID = 'WEST_ISTATE_ID'
ENV_PARENT_DATA_REF = 'WEST_PARENT_DATA_REF'
ENV_PARENT_SEG_ID = 'WEST_PARENT_ID'
ENV_RAND128 = 'WEST_RAND128'
ENV_RAND16 = 'WEST_RAND16'
ENV_RAND32 = 'WEST_RAND32'
ENV_RAND64 = 'WEST_RAND64'
ENV_RANDFLOAT = 'WEST_RANDFLOAT'
ENV_STRUCT_DATA_REF = 'WEST_STRUCT_DATA_REF'
exec_child(executable, environ=None, stdin=None, stdout=None, stderr=None, cwd=None)

Execute a child process with the environment set from the current environment, the values of self.addtl_child_environ, the random numbers returned by self.random_val_env_vars, and the given environ (applied in that order). stdin/stdout/stderr are optionally redirected.

This function waits on the child process to finish, then returns (rc, rusage), where rc is the child’s return code and rusage is the resource usage tuple from os.wait4()

exec_child_from_child_info(child_info, template_args, environ)
exec_for_basis_state(child_info, basis_state, addtl_env=None)

Execute a child process with environment and template expansion from the given basis state

exec_for_initial_state(child_info, initial_state, addtl_env=None)

Execute a child process with environment and template expansion from the given initial state.

exec_for_iteration(child_info, n_iter, addtl_env=None)

Execute a child process with environment and template expansion from the given iteration number.

exec_for_segment(child_info, segment, addtl_env=None)

Execute a child process with environment and template expansion from the given segment.

finalize_iteration(n_iter, segments)

Perform any necessary post-iteration cleanup. This is run by the work manager.

gen_istate(basis_state, initial_state)

Generate a new initial state from the given basis state.

get_pcoord(state)

Get the progress coordinate of the given basis or initial state.

static makepath(template, template_args=None, expanduser=True, expandvars=True, abspath=False, realpath=False)
prepare_file_system(segment, environ)
prepare_iteration(n_iter, segments)

Perform any necessary per-iteration preparation. This is run by the work manager.

propagate(segments)

Propagate one or more segments, including any necessary per-iteration setup and teardown for this propagator.

random_val_env_vars()

Return a set of environment variables containing random seeds. These are returned as a dictionary, suitable for use in os.environ.update() or as the env argument to subprocess.Popen(). Every child process executed by exec_child() gets these.

retrieve_dataset_return(state, return_files, del_return_files, single_point)

Retrieve returned data from the temporary locations directed by the environment variables. state is a Segment, BasisState , or InitialState``object that the return data is associated with. ``return_files is a dict where the keys are the dataset names and the values are the paths to the temporarily files that contain the returned data. del_return_files is a dict where the keys are the names of datasets to be deleted (if the corresponding value is set to True) once the data is retrieved.

setup_dataset_return(segment=None, subset_keys=None)

Set up temporary files and environment variables that point to them for segment runners to return data. segment is the Segment object that the return data is associated with. subset_keys specifies the names of a subset of data to be returned.

template_args_for_segment(segment)
update_args_env_basis_state(template_args, environ, basis_state)
update_args_env_initial_state(template_args, environ, initial_state)
update_args_env_iter(template_args, environ, n_iter)
update_args_env_segment(template_args, environ, segment)
westpa.core.data_manager.makepath(template, template_args=None, expanduser=True, expandvars=True, abspath=False, realpath=False)
class westpa.core.data_manager.flushing_lock(lock, fileobj)

Bases: object

class westpa.core.data_manager.expiring_flushing_lock(lock, flush_method, nextsync)

Bases: object

westpa.core.data_manager.seg_id_dtype

alias of int64

westpa.core.data_manager.n_iter_dtype

alias of uint32

westpa.core.data_manager.weight_dtype

alias of float64

westpa.core.data_manager.utime_dtype

alias of float64

westpa.core.data_manager.seg_status_dtype

alias of uint8

westpa.core.data_manager.seg_initpoint_dtype

alias of uint8

westpa.core.data_manager.seg_endpoint_dtype

alias of uint8

westpa.core.data_manager.istate_type_dtype

alias of uint8

westpa.core.data_manager.istate_status_dtype

alias of uint8

westpa.core.data_manager.nw_source_dtype

alias of uint8

class westpa.core.data_manager.WESTDataManager(rc=None)

Bases: object

Data manager for assisiting the reading and writing of WEST data from/to HDF5 files.

default_iter_prec = 8
default_we_h5filename = 'west.h5'
default_we_h5file_driver = None
default_flush_period = 60
default_aux_compression_threshold = 1048576
binning_hchunksize = 4096
table_scan_chunksize = 1024
flushing_lock()
expiring_flushing_lock()
process_config()
property system
property closed
iter_group_name(n_iter, absolute=True)
require_iter_group(n_iter)

Get the group associated with n_iter, creating it if necessary.

del_iter_group(n_iter)
get_iter_group(n_iter)
get_seg_index(n_iter)
property current_iteration
open_backing(mode=None)

Open the (already-created) HDF5 file named in self.west_h5filename.

prepare_backing()

Create new HDF5 file

close_backing()
flush_backing()
save_target_states(tstates, n_iter=None)

Save the given target states in the HDF5 file; they will be used for the next iteration to be propagated. A complete set is required, even if nominally appending to an existing set, which simplifies the mapping of IDs to the table.

find_tstate_group(n_iter)
find_ibstate_group(n_iter)
get_target_states(n_iter)

Return a list of Target objects representing the target (sink) states that are in use for iteration n_iter. Future iterations are assumed to continue from the most recent set of states.

create_ibstate_group(basis_states, n_iter=None)

Create the group used to store basis states and initial states (whose definitions are always coupled). This group is hard-linked into all iteration groups that use these basis and initial states.

create_ibstate_iter_h5file(basis_states)

Create the per-iteration HDF5 file for the basis states (i.e., iteration 0). This special treatment is needed so that the analysis tools can access basis states more easily.

update_iter_h5file(n_iter, segments)

Write out the per-iteration HDF5 file with given segments and add an external link to it in the main HDF5 file (west.h5) if the link is not present.

get_basis_states(n_iter=None)

Return a list of BasisState objects representing the basis states that are in use for iteration n_iter.

create_initial_states(n_states, n_iter=None)

Create storage for n_states initial states associated with iteration n_iter, and return bare InitialState objects with only state_id set.

update_initial_states(initial_states, n_iter=None)

Save the given initial states in the HDF5 file

get_initial_states(n_iter=None)
get_segment_initial_states(segments, n_iter=None)

Retrieve all initial states referenced by the given segments.

get_unused_initial_states(n_states=None, n_iter=None)

Retrieve any prepared but unused initial states applicable to the given iteration. Up to n_states states are returned; if n_states is None, then all unused states are returned.

prepare_iteration(n_iter, segments)

Prepare for a new iteration by creating space to store the new iteration’s data. The number of segments, their IDs, and their lineage must be determined and included in the set of segments passed in.

Update the per-iteration hard links pointing to the tables of target and initial/basis states for the given iteration. These links are not used by this class, but are remarkably convenient for third-party analysis tools and hdfview.

get_iter_summary(n_iter=None)
update_iter_summary(summary, n_iter=None)
del_iter_summary(min_iter)
update_segments(n_iter, segments)

Update segment information in the HDF5 file; all prior information for each segment is overwritten, except for parent and weight transfer information.

get_segments(n_iter=None, seg_ids=None, load_pcoords=True)

Return the given (or all) segments from a given iteration.

If the optional parameter load_auxdata is true, then all auxiliary datasets available are loaded and mapped onto the data dictionary of each segment. If load_auxdata is None, then use the default self.auto_load_auxdata, which can be set by the option load_auxdata in the [data] section of west.cfg. This essentially requires as much RAM as there is per-iteration auxiliary data, so this behavior is not on by default.

prepare_segment_restarts(segments, basis_states=None, initial_states=None)

Prepare the necessary folder and files given the data stored in parent per-iteration HDF5 file for propagating the simulation. basis_states and initial_states should be provided if the segments are newly created

get_all_parent_ids(n_iter)
get_parent_ids(n_iter, seg_ids=None)

Return a sequence of the parent IDs of the given seg_ids.

get_weights(n_iter, seg_ids)

Return the weights associated with the given seg_ids

get_child_ids(n_iter, seg_id)

Return the seg_ids of segments who have the given segment as a parent.

get_children(segment)

Return all segments which have the given segment as a parent

prepare_run()
finalize_run()
save_new_weight_data(n_iter, new_weights)

Save a set of NewWeightEntry objects to HDF5. Note that this should be called for the iteration in which the weights appear in their new locations (e.g. for recycled walkers, the iteration following recycling).

get_new_weight_data(n_iter)
find_bin_mapper(hashval)

Check to see if the given has value is in the binning table. Returns the index in the bin data tables if found, or raises KeyError if not.

get_bin_mapper(hashval)

Look up the given hash value in the binning table, unpickling and returning the corresponding bin mapper if available, or raising KeyError if not.

save_bin_mapper(hashval, pickle_data)

Store the given mapper in the table of saved mappers. If the mapper cannot be stored, PickleError will be raised. Returns the index in the bin data tables where the mapper is stored.

save_iter_binning(n_iter, hashval, pickled_mapper, target_counts)

Save information about the binning used to generate segments for iteration n_iter.

westpa.core.data_manager.normalize_dataset_options(dsopts, path_prefix='', n_iter=0)
westpa.core.data_manager.create_dataset_from_dsopts(group, dsopts, shape=None, dtype=None, data=None, autocompress_threshold=None, n_iter=None)
westpa.core.data_manager.require_dataset_from_dsopts(group, dsopts, shape=None, dtype=None, data=None, autocompress_threshold=None, n_iter=None)
westpa.core.data_manager.calc_chunksize(shape, dtype, max_chunksize=262144)

Calculate a chunk size for HDF5 data, anticipating that access will slice along lower dimensions sooner than higher dimensions.

westpa.core.extloader module

westpa.core.extloader.load_module(module_name, path=None)

Load and return the given module, recursively loading containing packages as necessary.

westpa.core.extloader.get_object(object_name, path=None)

Attempt to load the given object, using additional path information if given.

westpa.core.h5io module

Miscellaneous routines to help with HDF5 input and output of WEST-related data.

exception westpa.core.h5io.NaturalNameWarning

Bases: Warning

Issued when a non-pythonic name is given for a node.

This is not an error and may even be very useful in certain contexts, but one should be aware that such nodes cannot be accessed using natural naming (instead, getattr() must be used explicitly).

class westpa.core.h5io.Trajectory(xyz, topology, time=None, unitcell_lengths=None, unitcell_angles=None)

Bases: object

Container object for a molecular dynamics trajectory

A Trajectory represents a collection of one or more molecular structures, generally (but not necessarily) from a molecular dynamics trajectory. The Trajectory stores a number of fields describing the system through time, including the cartesian coordinates of each atoms (xyz), the topology of the molecular system (topology), and information about the unitcell if appropriate (unitcell_vectors, unitcell_length, unitcell_angles).

A Trajectory should generally be constructed by loading a file from disk. Trajectories can be loaded from (and saved to) the PDB, XTC, TRR, DCD, NetCDF or MDTraj HDF5 formats.

Trajectory supports fancy indexing, so you can extract one or more frames from a Trajectory as a separate trajectory. For example, to form a trajectory with every other frame, you can slice with traj[::2].

Trajectory uses the nanometer, degree & picosecond unit system.

Examples

>>> # loading a trajectory
>>> import mdtraj as md
>>> md.load('trajectory.xtc', top='native.pdb')
<mdtraj.Trajectory with 1000 frames, 22 atoms at 0x1058a73d0>
>>> # slicing a trajectory
>>> t = md.load('trajectory.h5')
>>> print(t)
<mdtraj.Trajectory with 100 frames, 22 atoms>
>>> print(t[::2])
<mdtraj.Trajectory with 50 frames, 22 atoms>
>>> # calculating the average distance between two atoms
>>> import mdtraj as md
>>> import numpy as np
>>> t = md.load('trajectory.h5')
>>> np.mean(np.sqrt(np.sum((t.xyz[:, 0, :] - t.xyz[:, 21, :])**2, axis=1)))

See also

mdtraj.load

High-level function that loads files and returns an md.Trajectory

n_frames
Type:

int

n_atoms
Type:

int

n_residues
Type:

int

time
Type:

np.ndarray, shape=(n_frames,)

timestep
Type:

float

topology
Type:

md.Topology

top
Type:

md.Topology

xyz
Type:

np.ndarray, shape=(n_frames, n_atoms, 3)

unitcell_vectors
Type:

{np.ndarray, shape=(n_frames, 3, 3), None}

unitcell_lengths
Type:

{np.ndarray, shape=(n_frames, 3), None}

unitcell_angles
Type:

{np.ndarray, shape=(n_frames, 3), None}

atom_slice(atom_indices, inplace=False)

Create a new trajectory from a subset of atoms

Parameters:
  • atom_indices (array-like, dtype=int, shape=(n_atoms)) – List of indices of atoms to retain in the new trajectory.

  • inplace (bool, default=False) – If True, the operation is done inplace, modifying self. Otherwise, a copy is returned with the sliced atoms, and self is not modified.

Returns:

traj – The return value is either self, or the new trajectory, depending on the value of inplace.

Return type:

md.Trajectory

See also

stack

stack multiple trajectories along the atom axis

center_coordinates(mass_weighted=False)

Center each trajectory frame at the origin (0,0,0).

This method acts inplace on the trajectory. The centering can be either uniformly weighted (mass_weighted=False) or weighted by the mass of each atom (mass_weighted=True).

Parameters:

mass_weighted (bool, optional (default = False)) – If True, weight atoms by mass when removing COM.

Return type:

self

image_molecules(inplace=False, anchor_molecules=None, other_molecules=None, sorted_bonds=None, make_whole=True)

Recenter and apply periodic boundary conditions to the molecules in each frame of the trajectory.

This method is useful for visualizing a trajectory in which molecules were not wrapped to the periodic unit cell, or in which the macromolecules are not centered with respect to the solvent. It tries to be intelligent in deciding what molecules to center, so you can simply call it and trust that it will “do the right thing”.

Parameters:
  • inplace (bool, default=False) – If False, a new Trajectory is created and returned. If True, this Trajectory is modified directly.

  • anchor_molecules (list of atom sets, optional, default=None) – Molecule that should be treated as an “anchor”. These molecules will be centered in the box and put near each other. If not specified, anchor molecules are guessed using a heuristic.

  • other_molecules (list of atom sets, optional, default=None) – Molecules that are not anchors. If not specified, these will be molecules other than the anchor molecules

  • sorted_bonds (array of shape (n_bonds, 2)) – Pairs of atom indices that define bonds, in sorted order. If not specified, these will be determined from the trajectory’s topology. Only relevant if make_whole is True.

  • make_whole (bool) – Whether to make molecules whole.

Returns:

traj – The return value is either self or the new trajectory, depending on the value of inplace.

Return type:

md.Trajectory

See also

Topology.guess_anchor_molecules

join(other, check_topology=True, discard_overlapping_frames=False)

Join two trajectories together along the time/frame axis.

This method joins trajectories along the time axis, giving a new trajectory of length equal to the sum of the lengths of self and other. It can also be called by using self + other

Parameters:
  • other (Trajectory or list of Trajectory) – One or more trajectories to join with this one. These trajectories are appended to the end of this trajectory.

  • check_topology (bool) – Ensure that the topology of self and other are identical before joining them. If false, the resulting trajectory will have the topology of self.

  • discard_overlapping_frames (bool, optional) – If True, compare coordinates at trajectory edges to discard overlapping frames. Default: False.

See also

stack

join two trajectories along the atom axis

static load(filenames, **kwargs)

Load a trajectory from disk

Parameters:
  • filenames ({path-like, [path-like]}) – Either a path or list of paths

  • extension (As requested by the various load functions -- it depends on the)

make_molecules_whole(inplace=False, sorted_bonds=None)

Only make molecules whole

Parameters:
  • inplace (bool) – If False, a new Trajectory is created and returned. If True, this Trajectory is modified directly.

  • sorted_bonds (array of shape (n_bonds, 2)) – Pairs of atom indices that define bonds, in sorted order. If not specified, these will be determined from the trajectory’s topology.

See also

image_molecules

property n_atoms

Number of atoms in the trajectory

Returns:

n_atoms – The number of atoms in the trajectory

Return type:

int

property n_chains

Number of chains in the trajectory

Returns:

n_chains – The number of chains in the trajectory’s topology

Return type:

int

property n_frames

Number of frames in the trajectory

Returns:

n_frames – The number of frames in the trajectory

Return type:

int

property n_residues

Number of residues (amino acids) in the trajectory

Returns:

n_residues – The number of residues in the trajectory’s topology

Return type:

int

openmm_boxes(frame)

OpenMM-compatable box vectors of a single frame.

Examples

>>> t = md.load('trajectory.h5')
>>> context.setPeriodicBoxVectors(t.openmm_positions(0))
Parameters:

frame (int) – Return box for this single frame.

Returns:

box – The periodic box vectors for this frame, formatted for input to OpenMM.

Return type:

tuple

openmm_positions(frame)

OpenMM-compatable positions of a single frame.

Examples

>>> t = md.load('trajectory.h5')
>>> context.setPositions(t.openmm_positions(0))
Parameters:

frame (int) – The index of frame of the trajectory that you wish to extract

Returns:

positions – The cartesian coordinates of specific trajectory frame, formatted for input to OpenMM

Return type:

list

remove_solvent(exclude=None, inplace=False)

Create a new trajectory without solvent atoms

Parameters:
  • exclude (array-like, dtype=str, shape=(n_solvent_types)) – List of solvent residue names to retain in the new trajectory.

  • inplace (bool, default=False) – The return value is either self, or the new trajectory, depending on the value of inplace.

Returns:

traj – The return value is either self, or the new trajectory, depending on the value of inplace.

Return type:

md.Trajectory

restrict_atoms(**kwargs)

DEPRECATED: restrict_atoms was replaced by atom_slice and will be removed in 2.0

Retain only a subset of the atoms in a trajectory

Deletes atoms not in atom_indices, and re-indexes those that remain

Parameters:
  • atom_indices (array-like, dtype=int, shape=(n_atoms)) – List of atom indices to keep.

  • inplace (bool, default=True) – If True, the operation is done inplace, modifying self. Otherwise, a copy is returned with the restricted atoms, and self is not modified.

Returns:

traj – The return value is either self, or the new trajectory, depending on the value of inplace.

Return type:

md.Trajectory

save(filename, **kwargs)

Save trajectory to disk, in a format determined by the filename extension

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory. The extension will be parsed and will control the format.

  • lossy (bool) – For .h5 or .lh5, whether or not to use compression.

  • no_models (bool) – For .pdb. TODO: Document this?

  • force_overwrite (bool) – If filename already exists, overwrite it.

save_amberrst7(filename, force_overwrite=True)

Save trajectory in AMBER ASCII restart format

Parameters:
  • filename (path-like) – filesystem path in which to save the restart

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there

Notes

Amber restart files can only store a single frame. If only one frame exists, “filename” will be written. Otherwise, “filename.#” will be written, where # is a zero-padded number from 1 to the total number of frames in the trajectory

save_dcd(filename, force_overwrite=True)

Save trajectory to CHARMM/NAMD DCD format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filenames, if its already there

save_dtr(filename, force_overwrite=True)

Save trajectory to DESMOND DTR format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filenames, if its already there

save_gro(filename, force_overwrite=True, precision=3)

Save trajectory in Gromacs .gro format

Parameters:
  • filename (path-like) – Path to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at that filename if it exists

  • precision (int, default=3) – The number of decimal places to use for coordinates in GRO file

save_gsd(filename, force_overwrite=True)

Save trajectory to HOOMD GSD format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filenames, if its already there

save_hdf5(filename, mode='w', force_overwrite=True)

Save trajectory to MDTraj HDF5 format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there

  • mode (str, default='w') – The mode in which to save the file. ‘w’ will overwrite any existing file, ‘a’ will append to an existing file.

save_lammpstrj(filename, force_overwrite=True)

Save trajectory to LAMMPS custom dump format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there

save_lh5(filename, force_overwrite=True)

Save trajectory in deprecated MSMBuilder2 LH5 (lossy HDF5) format.

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there

save_mdcrd(filename, force_overwrite=True)

Save trajectory to AMBER mdcrd format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there

save_netcdf(filename, force_overwrite=True)

Save trajectory in AMBER NetCDF format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there

save_netcdfrst(filename, force_overwrite=True)

Save trajectory in AMBER NetCDF restart format

Parameters:
  • filename (path-like) – filesystem path in which to save the restart

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if it’s already there

Notes

NetCDF restart files can only store a single frame. If only one frame exists, “filename” will be written. Otherwise, “filename.#” will be written, where # is a zero-padded number from 1 to the total number of frames in the trajectory

save_pdb(filename, force_overwrite=True, bfactors=None, ter=True, header=True)

Save trajectory to RCSB PDB format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there

  • bfactors (array_like, default=None, shape=(n_frames, n_atoms) or (n_atoms,)) – Save bfactors with pdb file. If the array is two dimensional it should contain a bfactor for each atom in each frame of the trajectory. Otherwise, the same bfactor will be saved in each frame.

  • ter (bool, default=True) – Include TER lines in pdb to indicate end of a chain of residues. This is useful if you need to keep atom numbers consistent.

  • header (bool, default=True) – Include header in pdb. Useful if you want the extra output, but sometimes prevent programs from running smoothly.

save_trr(filename, force_overwrite=True)

Save trajectory to Gromacs TRR format

Notes

Only the xyz coordinates and the time are saved, the velocities and forces in the trr will be zeros

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there

save_xtc(filename, force_overwrite=True)

Save trajectory to Gromacs XTC format

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there

save_xyz(filename, force_overwrite=True)

Save trajectory to .xyz format.

Parameters:
  • filename (path-like) – filesystem path in which to save the trajectory

  • force_overwrite (bool, default=True) – Overwrite anything that exists at filename, if its already there

slice(key, copy=True)

Slice trajectory, by extracting one or more frames into a separate object

This method can also be called using index bracket notation, i.e traj[1] == traj.slice(1)

Parameters:
  • key ({int, np.ndarray, slice}) – The slice to take. Can be either an int, a list of ints, or a slice object.

  • copy (bool, default=True) – Copy the arrays after slicing. If you set this to false, then if you modify a slice, you’ll modify the original array since they point to the same data.

smooth(width, order=3, atom_indices=None, inplace=False)

Smoothen a trajectory using a zero-delay Buttersworth filter. Please note that for optimal results the trajectory should be properly aligned prior to smoothing (see md.Trajectory.superpose).

Parameters:
  • width (int) – This acts very similar to the window size in a moving average smoother. In this implementation, the frequency of the low-pass filter is taken to be two over this width, so it’s like “half the period” of the sinusiod where the filter starts to kick in. Must be an integer greater than one.

  • order (int, optional, default=3) – The order of the filter. A small odd number is recommended. Higher order filters cutoff more quickly, but have worse numerical properties.

  • atom_indices (array-like, dtype=int, shape=(n_atoms), default=None) – List of indices of atoms to retain in the new trajectory. Default is set to None, which applies smoothing to all atoms.

  • inplace (bool, default=False) – The return value is either self, or the new trajectory, depending on the value of inplace.

Returns:

traj – The return value is either self, or the new smoothed trajectory, depending on the value of inplace.

Return type:

md.Trajectory

References

stack(other, keep_resSeq=True)

Stack two trajectories along the atom axis

This method joins trajectories along the atom axis, giving a new trajectory with a number of atoms equal to the sum of the number of atoms in self and other.

Notes

The resulting trajectory will have the unitcell and time information the left operand.

Examples

>>> t1 = md.load('traj1.h5')
>>> t2 = md.load('traj2.h5')
>>> # even when t2 contains no unitcell information
>>> t2.unitcell_vectors = None
>>> stacked = t1.stack(t2)
>>> # the stacked trajectory inherits the unitcell information
>>> # from the first trajectory
>>> np.all(stacked.unitcell_vectors == t1.unitcell_vectors)
True
Parameters:
  • other (Trajectory) – The other trajectory to join

  • keep_resSeq (bool, optional, default=True) – see `mdtraj.core.topology.Topology.join` method documentation

See also

join

join two trajectories along the time/frame axis.

superpose(reference, frame=0, atom_indices=None, ref_atom_indices=None, parallel=True)

Superpose each conformation in this trajectory upon a reference

Parameters:
  • reference (md.Trajectory) – Align self to a particular frame in reference

  • frame (int) – The index of the conformation in reference to align to.

  • atom_indices (array_like, or None) – The indices of the atoms to superpose. If not supplied, all atoms will be used.

  • ref_atom_indices (array_like, or None) – Use these atoms on the reference structure. If not supplied, the same atom indices will be used for this trajectory and the reference one.

  • parallel (bool) – Use OpenMP to run the superposition in parallel over multiple cores

Return type:

self

property time

The simulation time corresponding to each frame, in picoseconds

Returns:

time – The simulation time corresponding to each frame, in picoseconds

Return type:

np.ndarray, shape=(n_frames,)

property timestep

Timestep between frames, in picoseconds

Returns:

timestep – The timestep between frames, in picoseconds.

Return type:

float

property top

Alias for self.topology, describing the organization of atoms into residues, bonds, etc

Returns:

topology – The topology object, describing the organization of atoms into residues, bonds, etc

Return type:

md.Topology

property topology

Topology of the system, describing the organization of atoms into residues, bonds, etc

Returns:

topology – The topology object, describing the organization of atoms into residues, bonds, etc

Return type:

md.Topology

property unitcell_angles

Angles that define the shape of the unit cell in each frame.

Returns:

lengths – The angles between the three unitcell vectors in each frame, alpha, beta, and gamma. alpha' gives the angle between vectors ``b and c, beta gives the angle between vectors c and a, and gamma gives the angle between vectors a and b. The angles are in degrees.

Return type:

np.ndarray, shape=(n_frames, 3)

property unitcell_lengths

Lengths that define the shape of the unit cell in each frame.

Returns:

lengths – Lengths of the unit cell in each frame, in nanometers, or None if the Trajectory contains no unitcell information.

Return type:

{np.ndarray, shape=(n_frames, 3), None}

property unitcell_vectors

The vectors that define the shape of the unit cell in each frame

Returns:

vectors – Vectors defining the shape of the unit cell in each frame. The semantics of this array are that the shape of the unit cell in frame i are given by the three vectors, value[i, 0, :], value[i, 1, :], and value[i, 2, :].

Return type:

np.ndarray, shape(n_frames, 3, 3)

property unitcell_volumes

Volumes of unit cell for each frame.

Returns:

volumes – Volumes of the unit cell in each frame, in nanometers^3, or None if the Trajectory contains no unitcell information.

Return type:

{np.ndarray, shape=(n_frames), None}

property xyz

Cartesian coordinates of each atom in each simulation frame

Returns:

xyz – A three dimensional numpy array, with the cartesian coordinates of each atoms in each frame.

Return type:

np.ndarray, shape=(n_frames, n_atoms, 3)

westpa.core.h5io.join_traj(trajs, check_topology=True, discard_overlapping_frames=False)

Concatenate multiple trajectories into one long trajectory

Parameters:
  • trajs (iterable of trajectories) – Combine these into one trajectory

  • check_topology (bool) – Make sure topologies match before joining

  • discard_overlapping_frames (bool) – Check for overlapping frames and discard

westpa.core.h5io.in_units_of(quantity, units_in, units_out, inplace=False)

Convert a numerical quantity between unit systems.

Parameters:
  • quantity ({number, np.ndarray, openmm.unit.Quantity}) – quantity can either be a unitted quantity – i.e. instance of openmm.unit.Quantity, or just a bare number or numpy array

  • units_in (str) – If you supply a quantity that’s not a openmm.unit.Quantity, you should tell me what units it is in. If you don’t, i’m just going to echo you back your quantity without doing any unit checking.

  • units_out (str) – A string description of the units you want out. This should look like “nanometers/picosecond” or “nanometers**3” or whatever

  • inplace (bool) – Attempt to do the transformation inplace, by mutating the quantity argument and avoiding a copy. This is only possible if quantity is a writable numpy array.

Returns:

rquantity – The resulting quantity, in the new unit system. If the function was called with inplace=True and quantity was a writable numpy array, rquantity will alias the same memory as the input quantity, which will have been changed inplace. Otherwise, if a copy was required, rquantity will point to new memory.

Return type:

{number, np.ndarray}

Examples

>>> in_units_of(1, 'meter**2/second', 'nanometers**2/picosecond')
1000000.0
westpa.core.h5io.import_(module)

Import a module, and issue a nice message to stderr if the module isn’t installed.

Currently, this function will print nice error messages for networkx, tables, netCDF4, and openmm.unit, which are optional MDTraj dependencies.

Parameters:

module (str) – The module you’d like to import, as a string

Returns:

module – The module object

Return type:

{module, object}

Examples

>>> # the following two lines are equivalent. the difference is that the
>>> # second will check for an ImportError and print you a very nice
>>> # user-facing message about what's wrong (where you can install the
>>> # module from, etc) if the import fails
>>> import tables
>>> tables = import_('tables')
westpa.core.h5io.ensure_type(val, dtype, ndim, name, length=None, can_be_none=False, shape=None, warn_on_cast=True, add_newaxis_on_deficient_ndim=False)

Typecheck the size, shape and dtype of a numpy array, with optional casting.

Parameters:
  • val ({np.ndaraay, None}) – The array to check

  • dtype ({nd.dtype, str}) – The dtype you’d like the array to have

  • ndim (int) – The number of dimensions you’d like the array to have

  • name (str) – name of the array. This is used when throwing exceptions, so that we can describe to the user which array is messed up.

  • length (int, optional) – How long should the array be?

  • can_be_none (bool) – Is val == None acceptable?

  • shape (tuple, optional) – What should be shape of the array be? If the provided tuple has Nones in it, those will be semantically interpreted as matching any length in that dimension. So, for example, using the shape spec (None, None, 3) will ensure that the last dimension is of length three without constraining the first two dimensions

  • warn_on_cast (bool, default=True) – Raise a warning when the dtypes don’t match and a cast is done.

  • add_newaxis_on_deficient_ndim (bool, default=True) – Add a new axis to the beginining of the array if the number of dimensions is deficient by one compared to your specification. For instance, if you’re trying to get out an array of ndim == 3, but the user provides an array of shape == (10, 10), a new axis will be created with length 1 in front, so that the return value is of shape (1, 10, 10).

Notes

The returned value will always be C-contiguous.

Returns:

typechecked_val – If val=None and can_be_none=True, then this will return None. Otherwise, it will return val (or a copy of val). If the dtype wasn’t right, it’ll be casted to the right shape. If the array was not C-contiguous, it’ll be copied as well.

Return type:

np.ndarray, None

class westpa.core.h5io.HDF5TrajectoryFile(filename, mode='r', force_overwrite=True, compression='zlib')

Bases: object

Interface for reading and writing to a MDTraj HDF5 molecular dynamics trajectory file, whose format is described here.

This is a file-like object, that both reading or writing depending on the mode flag. It implements the context manager protocol, so you can also use it with the python ‘with’ statement.

The format is extremely flexible and high performance. It can hold a wide variety of information about a trajectory, including fields like the temperature and energies. Because it’s built on the fantastic HDF5 library, it’s easily extensible too.

Parameters:
  • filename (path-like) – Path to the file to open

  • mode ({'r, 'w'}) – Mode in which to open the file. ‘r’ is for reading and ‘w’ is for writing

  • force_overwrite (bool) – In mode=’w’, how do you want to behave if a file by the name of filename already exists? if force_overwrite=True, it will be overwritten.

  • compression ({'zlib', None}) – Apply compression to the file? This will save space, and does not cost too many cpu cycles, so it’s recommended.

root
title
application
topology
randomState
forcefield
reference
constraints

See also

mdtraj.load_hdf5

High-level wrapper that returns a md.Trajectory

property application

Suite of programs that created the file

close()

Close the HDF5 file handle

property constraints

Constraints applied to the bond lengths

Returns:

constraints – A one dimensional array with the a int, int, float type giving the index of the two atoms involved in the constraints and the distance of the constraint. If no constraint information is in the file, the return value is None.

Return type:

{None, np.array, dtype=[(‘atom1’, ‘<i4’), (‘atom2’, ‘<i4’), (‘distance’, ‘<f4’)])}

distance_unit = 'nanometers'
flush()

Write all buffered data in the to the disk file.

property forcefield

Description of the hamiltonian used. A short, human readable string, like AMBER99sbildn.

property randomState

State of the creators internal random number generator at the start of the simulation

read(n_frames=None, stride=None, atom_indices=None)

Read one or more frames of data from the file

Parameters:
  • n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.

  • stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.

  • atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.

Notes

If you’d like more flexible access to the data, that is available by using the pytables group directly, which is accessible via the root property on this class.

Returns:

frames – The returned namedtuple will have the fields “coordinates”, “time”, “cell_lengths”, “cell_angles”, “velocities”, “kineticEnergy”, “potentialEnergy”, “temperature” and “alchemicalLambda”. Each of the fields in the returned namedtuple will either be a numpy array or None, dependening on if that data was saved in the trajectory. All of the data shall be n units of “nanometers”, “picoseconds”, “kelvin”, “degrees” and “kilojoules_per_mole”.

Return type:

namedtuple

read_as_traj(n_frames=None, stride=None, atom_indices=None)

Read a trajectory from the HDF5 file

Parameters:
  • n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.

  • stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.

  • atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.

Returns:

trajectory – A trajectory object containing the loaded portion of the file.

Return type:

Trajectory

property reference

A published reference that documents the program or parameters used to generate the data

property root

Direct access to the root group of the underlying Tables HDF5 file handle.

This can be used for random or specific access to the underlying arrays on disk

seek(offset, whence=0)

Move to a new file position

Parameters:
  • offset (int) – A number of frames.

  • whence ({0, 1, 2}) – 0: offset from start of file, offset should be >=0. 1: move relative to the current position, positive or negative 2: move relative to the end of file, offset should be <= 0. Seeking beyond the end of a file is not supported

tell()

Current file position

Returns:

offset – The current frame in the file.

Return type:

int

property title

User-defined title for the data represented in the file

property topology

Get the topology out from the file

Returns:

topology – A topology object

Return type:

mdtraj.Topology

write(coordinates, time=None, cell_lengths=None, cell_angles=None, velocities=None, kineticEnergy=None, potentialEnergy=None, temperature=None, alchemicalLambda=None)

Write one or more frames of data to the file

This method saves data that is associated with one or more simulation frames. Note that all of the arguments can either be raw numpy arrays or unitted arrays (with openmm.unit.Quantity). If the arrays are unittted, a unit conversion will be automatically done from the supplied units into the proper units for saving on disk. You won’t have to worry about it.

Furthermore, if you wish to save a single frame of simulation data, you can do so naturally, for instance by supplying a 2d array for the coordinates and a single float for the time. This “shape deficiency” will be recognized, and handled appropriately.

Parameters:
  • coordinates (np.ndarray, shape=(n_frames, n_atoms, 3)) – The cartesian coordinates of the atoms to write. By convention, the lengths should be in units of nanometers.

  • time (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the simulation time, in picoseconds corresponding to each frame.

  • cell_lengths (np.ndarray, shape=(n_frames, 3), dtype=float32, optional) – You may optionally specify the unitcell lengths. The length of the periodic box in each frame, in each direction, a, b, c. By convention the lengths should be in units of angstroms.

  • cell_angles (np.ndarray, shape=(n_frames, 3), dtype=float32, optional) – You may optionally specify the unitcell angles in each frame. Organized analogously to cell_lengths. Gives the alpha, beta and gamma angles respectively. By convention, the angles should be in units of degrees.

  • velocities (np.ndarray, shape=(n_frames, n_atoms, 3), optional) – You may optionally specify the cartesian components of the velocity for each atom in each frame. By convention, the velocities should be in units of nanometers / picosecond.

  • kineticEnergy (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the kinetic energy in each frame. By convention the kinetic energies should b in units of kilojoules per mole.

  • potentialEnergy (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the potential energy in each frame. By convention the kinetic energies should b in units of kilojoules per mole.

  • temperature (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the temperature in each frame. By convention the temperatures should b in units of Kelvin.

  • alchemicalLambda (np.ndarray, shape=(n_frames,), optional) – You may optionally specify the alchemical lambda in each frame. These have no units, but are generally between zero and one.

class westpa.core.h5io.Frames(coordinates, time, cell_lengths, cell_angles, velocities, kineticEnergy, potentialEnergy, temperature, alchemicalLambda)

Bases: tuple

Create new instance of Frames(coordinates, time, cell_lengths, cell_angles, velocities, kineticEnergy, potentialEnergy, temperature, alchemicalLambda)

alchemicalLambda

Alias for field number 8

cell_angles

Alias for field number 3

cell_lengths

Alias for field number 2

coordinates

Alias for field number 0

kineticEnergy

Alias for field number 5

potentialEnergy

Alias for field number 6

temperature

Alias for field number 7

time

Alias for field number 1

velocities

Alias for field number 4

class westpa.core.h5io.WESTTrajectory(coordinates, topology=None, time=None, iter_labels=None, seg_labels=None, pcoords=None, parent_ids=None, unitcell_lengths=None, unitcell_angles=None)

Bases: Trajectory

A subclass of mdtraj.Trajectory that contains the trajectory of atom coordinates with pointers denoting the iteration number and segment index of each frame.

iter_label_values()
property iter_labels

Iteration index corresponding to each frame

Returns:

time – The iteration index corresponding to each frame

Return type:

np.ndarray, shape=(n_frames,)

join(other, check_topology=True, discard_overlapping_frames=False)

Join two Trajectory``s. This overrides ``mdtraj.Trajectory.join so that it also handles WESTPA pointers. mdtraj.Trajectory.join’s documentation for more details.

property label_values
property parent_ids
property pcoords
seg_label_values(iteration=None)
property seg_labels

Segment index corresponding to each frame

Returns:

time – The segment index corresponding to each frame

Return type:

np.ndarray, shape=(n_frames,)

slice(key, copy=True)

Slice the Trajectory. This overrides mdtraj.Trajectory.slice so that it also handles WESTPA pointers. Please see mdtraj.Trajectory.slice’s documentation for more details.

westpa.core.h5io.resolve_filepath(path, constructor=<class 'h5py._hl.files.File'>, cargs=None, ckwargs=None, **addtlkwargs)

Use a combined filesystem and HDF5 path to open an HDF5 file and return the appropriate object. Returns (h5file, h5object). The file is opened using constructor(filename, *cargs, **ckwargs).

westpa.core.h5io.calc_chunksize(shape, dtype, max_chunksize=262144)

Calculate a chunk size for HDF5 data, anticipating that access will slice along lower dimensions sooner than higher dimensions.

westpa.core.h5io.tostr(b)

Convert a nonstandard string object b to str with the handling of the case where b is bytes.

westpa.core.h5io.is_within_directory(directory, target)
westpa.core.h5io.safe_extract(tar, path='.', members=None, *, numeric_owner=False)
westpa.core.h5io.create_hdf5_group(parent_group, groupname, replace=False, creating_program=None)

Create (or delete and recreate) and HDF5 group named groupname within the enclosing Group (object) parent_group. If replace is True, then the group is replaced if present; if False, then an error is raised if the group is present. After the group is created, HDF5 attributes are set using stamp_creator_data.

westpa.core.h5io.stamp_creator_data(h5group, creating_program=None)

Mark the following on the HDF5 group h5group:

creation_program:

The name of the program that created the group

creation_user:

The username of the user who created the group

creation_hostname:

The hostname of the machine on which the group was created

creation_time:

The date and time at which the group was created, in the current locale.

creation_unix_time:

The Unix time (seconds from the epoch, UTC) at which the group was created.

This is meant to facilitate tracking the flow of data, but should not be considered a secure paper trail (after all, anyone with write access to the HDF5 file can modify these attributes).

westpa.core.h5io.get_creator_data(h5group)

Read back creator data as written by stamp_creator_data, returning a dictionary with keys as described for stamp_creator_data. Missing fields are denoted with None. The creation_time field is returned as a string.

westpa.core.h5io.load_west(filename)

Load WESTPA trajectory files from disk.

Parameters:

filename (str) – String filename of HDF Trajectory file.

westpa.core.h5io.stamp_iter_range(h5object, start_iter, stop_iter)

Mark that the HDF5 object h5object (dataset or group) contains data from iterations start_iter <= n_iter < stop_iter.

westpa.core.h5io.get_iter_range(h5object)

Read back iteration range data written by stamp_iter_range

westpa.core.h5io.stamp_iter_step(h5group, iter_step)

Mark that the HDF5 object h5object (dataset or group) contains data with an iteration step (stride) of iter_step).

westpa.core.h5io.get_iter_step(h5group)

Read back iteration step (stride) written by stamp_iter_step

westpa.core.h5io.check_iter_range_least(h5object, iter_start, iter_stop)

Return True if the iteration range [iter_start, iter_stop) is the same as or entirely contained within the iteration range stored on h5object.

westpa.core.h5io.check_iter_range_equal(h5object, iter_start, iter_stop)

Return True if the iteration range [iter_start, iter_stop) is the same as the iteration range stored on h5object.

westpa.core.h5io.get_iteration_entry(h5object, n_iter)

Create a slice for data corresponding to iteration n_iter in h5object.

westpa.core.h5io.get_iteration_slice(h5object, iter_start, iter_stop=None, iter_stride=None)

Create a slice for data corresponding to iterations [iter_start,iter_stop), with stride iter_step, in the given h5object.

westpa.core.h5io.label_axes(h5object, labels, units=None)

Stamp the given HDF5 object with axis labels. This stores the axis labels in an array of strings in an attribute called axis_labels on the given object. units if provided is a corresponding list of units.

class westpa.core.h5io.WESTPAH5File(*args, **kwargs)

Bases: File

Generalized input/output for WESTPA simulation (or analysis) data.

Create a new file object.

See the h5py user guide for a detailed explanation of the options.

name

Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.

mode

r Readonly, file must exist (default) r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise

driver

Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘direct’, ‘stdio’, ‘mpio’, ‘ros3’.

libver

Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, ‘v112’ and ‘latest’.

userblock_size

Desired size of user block. Only allowed when creating a new file (mode w, w- or x).

swmr

Open the file in SWMR read mode. Only used when mode = ‘r’.

rdcc_nbytes

Total size of the dataset chunk cache in bytes. The default size is 1024**2 (1 MiB) per dataset. Applies to all datasets unless individually changed.

rdcc_w0

The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75. Applies to all datasets unless individually changed.

rdcc_nslots

The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521. Applies to all datasets unless individually changed.

track_order

Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.

fs_strategy

The file space handling strategy to be used. Only allowed when creating a new file (mode w, w- or x). Defined as: “fsm” FSM, Aggregators, VFD “page” Paged FSM, VFD “aggregate” Aggregators, VFD “none” VFD If None use HDF5 defaults.

fs_page_size

File space page size in bytes. Only used when fs_strategy=”page”. If None use the HDF5 default (4096 bytes).

fs_persist

A boolean value to indicate whether free space should be persistent or not. Only allowed when creating a new file. The default value is False.

fs_threshold

The smallest free-space section size that the free space manager will track. Only allowed when creating a new file. The default value is 1.

page_buf_size

Page buffer size in bytes. Only allowed for HDF5 files created with fs_strategy=”page”. Must be a power of two value and greater or equal than the file space page size when creating the file. It is not used by default.

min_meta_keep

Minimum percentage of metadata to keep in the page buffer before allowing pages containing metadata to be evicted. Applicable only if page_buf_size is set. Default value is zero.

min_raw_keep

Minimum percentage of raw data to keep in the page buffer before allowing pages containing raw data to be evicted. Applicable only if page_buf_size is set. Default value is zero.

locking

The file locking behavior. Defined as:

  • False (or “false”) – Disable file locking

  • True (or “true”) – Enable file locking

  • “best-effort” – Enable file locking but ignore some errors

  • None – Use HDF5 defaults

Warning

The HDF5_USE_FILE_LOCKING environment variable can override this parameter.

Only available with HDF5 >= 1.12.1 or 1.10.x >= 1.10.7.

alignment_threshold

Together with alignment_interval, this property ensures that any file object greater than or equal in size to the alignment threshold (in bytes) will be aligned on an address which is a multiple of alignment interval.

alignment_interval

This property should be used in conjunction with alignment_threshold. See the description above. For more details, see https://portal.hdfgroup.org/display/HDF5/H5P_SET_ALIGNMENT

meta_block_size

Set the current minimum size, in bytes, of new metadata block allocations. See https://portal.hdfgroup.org/display/HDF5/H5P_SET_META_BLOCK_SIZE

Additional keywords

Passed on to the selected file driver.

default_iter_prec = 8
replace_dataset(*args, **kwargs)
iter_object_name(n_iter, prefix='', suffix='')

Return a properly-formatted per-iteration name for iteration n_iter. (This is used in create/require/get_iter_group, but may also be useful for naming datasets on a per-iteration basis.)

create_iter_group(n_iter, group=None)

Create a per-iteration data storage group for iteration number n_iter in the group group (which is ‘/iterations’ by default).

require_iter_group(n_iter, group=None)

Ensure that a per-iteration data storage group for iteration number n_iter is available in the group group (which is ‘/iterations’ by default).

get_iter_group(n_iter, group=None)

Get the per-iteration data group for iteration number n_iter from within the group group (‘/iterations’ by default).

class westpa.core.h5io.WESTIterationFile(file, mode='r', force_overwrite=True, compression='zlib', link=None)

Bases: HDF5TrajectoryFile

read(frame_indices=None, atom_indices=None)

Read one or more frames of data from the file

Parameters:
  • n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.

  • stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.

  • atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.

Notes

If you’d like more flexible access to the data, that is available by using the pytables group directly, which is accessible via the root property on this class.

Returns:

frames – The returned namedtuple will have the fields “coordinates”, “time”, “cell_lengths”, “cell_angles”, “velocities”, “kineticEnergy”, “potentialEnergy”, “temperature” and “alchemicalLambda”. Each of the fields in the returned namedtuple will either be a numpy array or None, dependening on if that data was saved in the trajectory. All of the data shall be n units of “nanometers”, “picoseconds”, “kelvin”, “degrees” and “kilojoules_per_mole”.

Return type:

namedtuple

has_topology()
has_pointer()
has_restart(segment)
write_data(where, name, data)
read_data(where, name)
read_as_traj(iteration=None, segment=None, atom_indices=None)

Read a trajectory from the HDF5 file

Parameters:
  • n_frames ({int, None}) – The number of frames to read. If not supplied, all of the remaining frames will be read.

  • stride ({int, None}) – By default all of the frames will be read, but you can pass this flag to read a subset of of the data by grabbing only every stride-th frame from disk.

  • atom_indices ({int, None}) – By default all of the atom will be read, but you can pass this flag to read only a subsets of the atoms for the coordinates and velocities fields. Note that you will have to carefully manage the indices and the offsets, since the i-th atom in the topology will not necessarily correspond to the i-th atom in your subset.

Returns:

trajectory – A trajectory object containing the loaded portion of the file.

Return type:

Trajectory

read_restart(segment)
write_segment(segment, pop=False)
class westpa.core.h5io.DSSpec

Bases: object

Generalized WE dataset access

get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
get_segment_data(n_iter, seg_id)
class westpa.core.h5io.FileLinkedDSSpec(h5file_or_name)

Bases: DSSpec

Provide facilities for accessing WESTPA HDF5 files, including auto-opening and the ability to pickle references to such files for transmission (through, e.g., the work manager), provided that the HDF5 file can be accessed by the same path on both the sender and receiver.

property h5file

Lazily open HDF5 file. This is required because allowing an open HDF5 file to cross a fork() boundary generally corrupts the internal state of the HDF5 library.

class westpa.core.h5io.SingleDSSpec(h5file_or_name, dsname, alias=None, slice=None)

Bases: FileLinkedDSSpec

classmethod from_string(dsspec_string, default_h5file)
class westpa.core.h5io.SingleIterDSSpec(h5file_or_name, dsname, alias=None, slice=None)

Bases: SingleDSSpec

get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
class westpa.core.h5io.SingleSegmentDSSpec(h5file_or_name, dsname, alias=None, slice=None)

Bases: SingleDSSpec

get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
get_segment_data(n_iter, seg_id)
class westpa.core.h5io.FnDSSpec(h5file_or_name, fn)

Bases: FileLinkedDSSpec

get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
class westpa.core.h5io.MultiDSSpec(dsspecs)

Bases: DSSpec

get_iter_data(n_iter, seg_slice=(slice(None, None, None),))
class westpa.core.h5io.IterBlockedDataset(dataset_or_array, attrs=None)

Bases: object

classmethod empty_like(blocked_dataset)
cache_data(max_size=None)

Cache this dataset in RAM. If max_size is given, then only cache if the entire dataset fits in max_size bytes. If max_size is the string ‘available’, then only cache if the entire dataset fits in available RAM, as defined by the psutil module.

drop_cache()
iter_entry(n_iter)
iter_slice(start=None, stop=None)

westpa.core.progress module

westpa.core.progress.linregress(x, y=None, alternative='two-sided')

Calculate a linear least-squares regression for two sets of measurements.

Parameters:
  • x (array_like) –

    Two sets of measurements. Both arrays should have the same length N. If only x is given (and y=None), then it must be a two-dimensional array where one dimension has length 2. The two sets of measurements are then found by splitting the array along the length-2 dimension. In the case where y=None and x is a 2xN array, linregress(x) is equivalent to linregress(x[0], x[1]).

    Deprecated since version 1.14.0: Inference of the two sets of measurements from a single argument x is deprecated will result in an error in SciPy 1.16.0; the sets must be specified separately as x and y.

  • y (array_like) –

    Two sets of measurements. Both arrays should have the same length N. If only x is given (and y=None), then it must be a two-dimensional array where one dimension has length 2. The two sets of measurements are then found by splitting the array along the length-2 dimension. In the case where y=None and x is a 2xN array, linregress(x) is equivalent to linregress(x[0], x[1]).

    Deprecated since version 1.14.0: Inference of the two sets of measurements from a single argument x is deprecated will result in an error in SciPy 1.16.0; the sets must be specified separately as x and y.

  • alternative ({'two-sided', 'less', 'greater'}, optional) –

    Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available:

    • ’two-sided’: the slope of the regression line is nonzero

    • ’less’: the slope of the regression line is less than zero

    • ’greater’: the slope of the regression line is greater than zero

    Added in version 1.7.0.

Returns:

result – The return value is an object with the following attributes:

slopefloat

Slope of the regression line.

interceptfloat

Intercept of the regression line.

rvaluefloat

The Pearson correlation coefficient. The square of rvalue is equal to the coefficient of determination.

pvaluefloat

The p-value for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic. See alternative above for alternative hypotheses.

stderrfloat

Standard error of the estimated slope (gradient), under the assumption of residual normality.

intercept_stderrfloat

Standard error of the estimated intercept, under the assumption of residual normality.

Return type:

LinregressResult instance

See also

scipy.optimize.curve_fit

Use non-linear least squares to fit a function to data.

scipy.optimize.leastsq

Minimize the sum of squares of a set of equations.

Notes

For compatibility with older versions of SciPy, the return value acts like a namedtuple of length 5, with fields slope, intercept, rvalue, pvalue and stderr, so one can continue to write:

slope, intercept, r, p, se = linregress(x, y)

With that style, however, the standard error of the intercept is not available. To have access to all the computed values, including the standard error of the intercept, use the return value as an object with attributes, e.g.:

result = linregress(x, y)
print(result.intercept, result.intercept_stderr)

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from scipy import stats
>>> rng = np.random.default_rng()

Generate some data:

>>> x = rng.random(10)
>>> y = 1.6*x + rng.random(10)

Perform the linear regression:

>>> res = stats.linregress(x, y)

Coefficient of determination (R-squared):

>>> print(f"R-squared: {res.rvalue**2:.6f}")
R-squared: 0.717533

Plot the data along with the fitted line:

>>> plt.plot(x, y, 'o', label='original data')
>>> plt.plot(x, res.intercept + res.slope*x, 'r', label='fitted line')
>>> plt.legend()
>>> plt.show()

Calculate 95% confidence interval on slope and intercept:

>>> # Two-sided inverse Students t-distribution
>>> # p - probability, df - degrees of freedom
>>> from scipy.stats import t
>>> tinv = lambda p, df: abs(t.ppf(p/2, df))
>>> ts = tinv(0.05, len(x)-2)
>>> print(f"slope (95%): {res.slope:.6f} +/- {ts*res.stderr:.6f}")
slope (95%): 1.453392 +/- 0.743465
>>> print(f"intercept (95%): {res.intercept:.6f}"
...       f" +/- {ts*res.intercept_stderr:.6f}")
intercept (95%): 0.616950 +/- 0.544475
westpa.core.progress.nop()
class westpa.core.progress.ProgressIndicator(stream=None, interval=1)

Bases: object

draw_fancy()
draw_simple()
draw()
clear()
property operation
property extent
property progress
new_operation(operation, extent=None, progress=0)
start()
stop()

westpa.core.segment module

class westpa.core.segment.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)

Bases: object

A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)

SEG_STATUS_UNSET = 0
SEG_STATUS_PREPARED = 1
SEG_STATUS_COMPLETE = 2
SEG_STATUS_FAILED = 3
SEG_INITPOINT_UNSET = 0
SEG_INITPOINT_CONTINUES = 1
SEG_INITPOINT_NEWTRAJ = 2
SEG_ENDPOINT_UNSET = 0
SEG_ENDPOINT_CONTINUES = 1
SEG_ENDPOINT_MERGED = 2
SEG_ENDPOINT_RECYCLED = 3
statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
static initial_pcoord(segment)

Return the initial progress coordinate point of this segment.

static final_pcoord(segment)

Return the final progress coordinate point of this segment.

property initpoint_type
property initial_state_id
property status_text
property endpoint_type_text

westpa.core.sim_manager module

class westpa.core.sim_manager.timedelta

Bases: object

Difference between two datetime values.

timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)

All arguments are optional and default to 0. Arguments may be integers or floats, and may be positive or negative.

days

Number of days.

max = datetime.timedelta(days=999999999, seconds=86399, microseconds=999999)
microseconds

Number of microseconds (>= 0 and less than 1 second).

min = datetime.timedelta(days=-999999999)
resolution = datetime.timedelta(microseconds=1)
seconds

Number of seconds (>= 0 and less than 1 day).

total_seconds()

Total seconds in the duration.

exception westpa.core.sim_manager.PickleError

Bases: Exception

class westpa.core.sim_manager.zip_longest(*iterables, fillvalue=None)

Bases: object

Return a zip_longest object whose .__next__() method returns a tuple where the i-th element comes from the i-th iterable argument. The .__next__() method continues until the longest iterable in the argument sequence is exhausted and then it raises StopIteration. When the shorter iterables are exhausted, the fillvalue is substituted in their place. The fillvalue defaults to None or can be specified by a keyword argument.

class westpa.core.sim_manager.Counter(iterable=None, /, **kwds)

Bases: dict

Dict subclass for counting hashable items. Sometimes called a bag or multiset. Elements are stored as dictionary keys and their counts are stored as dictionary values.

>>> c = Counter('abcdeabcdabcaba')  # count elements from a string
>>> c.most_common(3)                # three most common elements
[('a', 5), ('b', 4), ('c', 3)]
>>> sorted(c)                       # list all unique elements
['a', 'b', 'c', 'd', 'e']
>>> ''.join(sorted(c.elements()))   # list elements with repetitions
'aaaaabbbbcccdde'
>>> sum(c.values())                 # total of all counts
15
>>> c['a']                          # count of letter 'a'
5
>>> for elem in 'shazam':           # update counts from an iterable
...     c[elem] += 1                # by adding 1 to each element's count
>>> c['a']                          # now there are seven 'a'
7
>>> del c['b']                      # remove all 'b'
>>> c['b']                          # now there are zero 'b'
0
>>> d = Counter('simsalabim')       # make another counter
>>> c.update(d)                     # add in the second counter
>>> c['a']                          # now there are nine 'a'
9
>>> c.clear()                       # empty the counter
>>> c
Counter()

Note: If a count is set to zero or reduced to zero, it will remain in the counter until the entry is deleted or the counter is cleared:

>>> c = Counter('aaabbc')
>>> c['b'] -= 2                     # reduce the count of 'b' by two
>>> c.most_common()                 # 'b' is still in, but its count is zero
[('a', 3), ('c', 1), ('b', 0)]

Create a new, empty Counter object. And if given, count elements from an input iterable. Or, initialize the count from another mapping of elements to their counts.

>>> c = Counter()                           # a new, empty counter
>>> c = Counter('gallahad')                 # a new counter from an iterable
>>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
>>> c = Counter(a=4, b=2)                   # a new counter from keyword args
copy()

Return a shallow copy.

elements()

Iterator over elements repeating each as many times as its count.

>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']

Knuth’s example for prime factors of 1836: 2**2 * 3**3 * 17**1

>>> import math
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> math.prod(prime_factors.elements())
1836

Note, if an element’s count has been set to zero or is a negative number, elements() will ignore it.

classmethod fromkeys(iterable, v=None)

Create a new dictionary with keys from iterable and values set to value.

most_common(n=None)

List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts.

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]
subtract(iterable=None, /, **kwds)

Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.subtract('witch')             # subtract elements from another iterable
>>> c.subtract(Counter('watch'))    # subtract elements from another counter
>>> c['h']                          # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w']                          # 1 in which, minus 1 in witch, minus 1 in watch
-1
total()

Sum of the counts

update(iterable=None, /, **kwds)

Like dict.update() but add counts instead of replacing them.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.update('witch')           # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d)                 # add elements from another counter
>>> c['h']                      # four 'h' in which, witch, and watch
4
class westpa.core.sim_manager.Generator(bit_generator)

Bases: object

Container for the BitGenerators.

Generator exposes a number of methods for generating random numbers drawn from a variety of probability distributions. In addition to the distribution-specific arguments, each method takes a keyword argument size that defaults to None. If size is None, then a single value is generated and returned. If size is an integer, then a 1-D array filled with generated values is returned. If size is a tuple, then an array with that shape is filled and returned.

The function numpy.random.default_rng() will instantiate a Generator with numpy’s default BitGenerator.

No Compatibility Guarantee

Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.

Parameters:

bit_generator (BitGenerator) – BitGenerator to use as the core generator.

Notes

The Python stdlib module :external+python:mod:`random` contains pseudo-random number generator with a number of methods that are similar to the ones available in Generator. It uses Mersenne Twister, and this bit generator can be accessed using MT19937. Generator, besides being NumPy-aware, has the advantage that it provides a much larger number of probability distributions to choose from.

Examples

>>> from numpy.random import Generator, PCG64
>>> rng = Generator(PCG64())
>>> rng.standard_normal()
-0.203  # random

See also

default_rng

Recommended constructor for Generator.

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalization, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

Parameters:
  • a (float or array_like of floats) – Alpha, positive (>0).

  • b (float or array_like of floats) – Beta, positive (>0).

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

Returns:

out – Drawn samples from the parameterized beta distribution.

Return type:

ndarray or scalar

Examples

The beta distribution has mean a/(a+b). If a == b and both are > 1, the distribution is symmetric with mean 0.5.

>>> rng = np.random.default_rng()
>>> a, b, size = 2.0, 2.0, 10000
>>> sample = rng.beta(a=a, b=b, size=size)
>>> np.mean(sample)
0.5047328775385895  # may vary

Otherwise the distribution is skewed left or right according to whether a or b is greater. The distribution is mirror symmetric. See for example:

>>> a, b, size = 2, 7, 10000
>>> sample_left = rng.beta(a=a, b=b, size=size)
>>> sample_right = rng.beta(a=b, b=a, size=size)
>>> m_left, m_right = np.mean(sample_left), np.mean(sample_right)
>>> print(m_left, m_right)
0.2238596793678923 0.7774613834041182  # may vary
>>> print(m_left - a/(a+b))
0.001637457145670096  # may vary
>>> print(m_right - b/(a+b))
-0.0003163943736596009  # may vary

Display the histogram of the two samples:

>>> import matplotlib.pyplot as plt
>>> plt.hist([sample_left, sample_right],
...          50, density=True, histtype='bar')
>>> plt.show()

References

binomial(n, p, size=None)

Draw samples from a binomial distribution.

Samples are drawn from a binomial distribution with specified parameters, n trials and p probability of success where n an integer >= 0 and p is in the interval [0,1]. (n may be input as a float, but it is truncated to an integer in use)

Parameters:
  • n (int or array_like of ints) – Parameter of the distribution, >= 0. Floats are also accepted, but they will be truncated to integers.

  • p (float or array_like of floats) – Parameter of the distribution, >= 0 and <=1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if n and p are both scalars. Otherwise, np.broadcast(n, p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized binomial distribution, where each sample is equal to the number of successes over the n trials.

Return type:

ndarray or scalar

See also

scipy.stats.binom

probability density function, distribution or cumulative density function, etc.

Notes

The probability mass function (PMF) for the binomial distribution is

\[P(N) = \binom{n}{N}p^N(1-p)^{n-N},\]

where \(n\) is the number of trials, \(p\) is the probability of success, and \(N\) is the number of successes.

When estimating the standard error of a proportion in a population by using a random sample, the normal distribution works well unless the product p*n <=5, where p = population proportion estimate, and n = number of samples, in which case the binomial distribution is used instead. For example, a sample of 15 people shows 4 who are left handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27*15 = 4, so the binomial distribution should be used in this case.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> n, p, size = 10, .5, 10000
>>> s = rng.binomial(n, p, 10000)

Assume a company drills 9 wild-cat oil exploration wells, each with an estimated probability of success of p=0.1. All nine wells fail. What is the probability of that happening?

Over size = 20,000 trials the probability of this happening is on average:

>>> n, p, size = 9, 0.1, 20000
>>> np.sum(rng.binomial(n=n, p=p, size=size) == 0)/size
0.39015  # may vary

The following can be used to visualize a sample with n=100, p=0.4 and the corresponding probability density function:

>>> import matplotlib.pyplot as plt
>>> from scipy.stats import binom
>>> n, p, size = 100, 0.4, 10000
>>> sample = rng.binomial(n, p, size=size)
>>> count, bins, _ = plt.hist(sample, 30, density=True)
>>> x = np.arange(n)
>>> y = binom.pmf(x, n, p)
>>> plt.plot(x, y, linewidth=2, color='r')
bit_generator

Gets the bit generator instance used by the generator

Returns:

bit_generator – The bit generator instance used by the generator

Return type:

BitGenerator

bytes(length)

Return random bytes.

Parameters:

length (int) – Number of random bytes.

Returns:

out – String of length length.

Return type:

bytes

Notes

This function generates random bytes from a discrete uniform distribution. The generated bytes are independent from the CPU’s native endianness.

Examples

>>> rng = np.random.default_rng()
>>> rng.bytes(10)
b'\xfeC\x9b\x86\x17\xf2\xa1\xafcp'  # random
chisquare(df, size=None)

Draw samples from a chi-square distribution.

When df independent random variables, each with standard normal distributions (mean 0, variance 1), are squared and summed, the resulting distribution is chi-square (see Notes). This distribution is often used in hypothesis testing.

Parameters:
  • df (float or array_like of floats) – Number of degrees of freedom, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if df is a scalar. Otherwise, np.array(df).size samples are drawn.

Returns:

out – Drawn samples from the parameterized chi-square distribution.

Return type:

ndarray or scalar

Raises:

ValueError – When df <= 0 or when an inappropriate size (e.g. size=-1) is given.

Notes

The variable obtained by summing the squares of df independent, standard normally distributed random variables:

\[Q = \sum_{i=1}^{\mathtt{df}} X^2_i\]

is chi-square distributed, denoted

\[Q \sim \chi^2_k.\]

The probability density function of the chi-squared distribution is

\[p(x) = \frac{(1/2)^{k/2}}{\Gamma(k/2)} x^{k/2 - 1} e^{-x/2},\]

where \(\Gamma\) is the gamma function,

\[\Gamma(x) = \int_0^{-\infty} t^{x - 1} e^{-t} dt.\]

References

Examples

>>> rng = np.random.default_rng()
>>> rng.chisquare(2,4)
array([ 1.89920014,  9.00867716,  3.13710533,  5.62318272]) # random

The distribution of a chi-square random variable with 20 degrees of freedom looks as follows:

>>> import matplotlib.pyplot as plt
>>> import scipy.stats as stats
>>> s = rng.chisquare(20, 10000)
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> x = np.linspace(0, 60, 1000)
>>> plt.plot(x, stats.chi2.pdf(x, df=20))
>>> plt.xlim([0, 60])
>>> plt.show()
choice(a, size=None, replace=True, p=None, axis=0, shuffle=True)

Generates a random sample from a given array

Parameters:
  • a ({array_like, int}) – If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated from np.arange(a).

  • size ({int, tuple[int]}, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn from the 1-d a. If a has more than one dimension, the size shape will be inserted into the axis dimension, so the output ndim will be a.ndim - 1 + len(size). Default is None, in which case a single value is returned.

  • replace (bool, optional) – Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.

  • p (1-D array_like, optional) – The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

  • axis (int, optional) – The axis along which the selection is performed. The default, 0, selects by row.

  • shuffle (bool, optional) – Whether the sample is shuffled when sampling without replacement. Default is True, False provides a speedup.

Returns:

samples – The generated random samples

Return type:

single item or ndarray

Raises:

ValueError – If a is an int and less than zero, if p is not 1-dimensional, if a is array-like with a size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size.

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

p must sum to 1 when cast to float64. To ensure this, you may wish to normalize using p = p / np.sum(p, dtype=float).

When passing a as an integer type and size is not specified, the return type is a native Python int.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> rng = np.random.default_rng()
>>> rng.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to rng.integers(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> rng.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> rng.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to rng.permutation(np.arange(5))[:3]

Generate a uniform random sample from a 2-D array along the first axis (the default), without replacement:

>>> rng.choice([[0, 1, 2], [3, 4, 5], [6, 7, 8]], 2, replace=False)
array([[3, 4, 5], # random
       [0, 1, 2]])

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> rng.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> rng.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
dirichlet(alpha, size=None)

Draw samples from the Dirichlet distribution.

Draw size samples of dimension k from a Dirichlet distribution. A Dirichlet-distributed random variable can be seen as a multivariate generalization of a Beta distribution. The Dirichlet distribution is a conjugate prior of a multinomial distribution in Bayesian inference.

Parameters:
  • alpha (sequence of floats, length k) – Parameter of the distribution (length k for sample of length k).

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n), then m * n * k samples are drawn. Default is None, in which case a vector of length k is returned.

Returns:

samples – The drawn samples, of shape (size, k).

Return type:

ndarray,

Raises:

ValueError – If any value in alpha is less than zero

Notes

The Dirichlet distribution is a distribution over vectors \(x\) that fulfil the conditions \(x_i>0\) and \(\sum_{i=1}^k x_i = 1\).

The probability density function \(p\) of a Dirichlet-distributed random vector \(X\) is proportional to

\[p(x) \propto \prod_{i=1}^{k}{x^{\alpha_i-1}_i},\]

where \(\alpha\) is a vector containing the positive concentration parameters.

The method uses the following property for computation: let \(Y\) be a random vector which has components that follow a standard gamma distribution, then \(X = \frac{1}{\sum_{i=1}^k{Y_i}} Y\) is Dirichlet-distributed

References

Examples

Taking an example cited in Wikipedia, this distribution can be used if one wanted to cut strings (each of initial length 1.0) into K pieces with different lengths, where each piece had, on average, a designated average length, but allowing some variation in the relative sizes of the pieces.

>>> rng = np.random.default_rng()
>>> s = rng.dirichlet((10, 5, 3), 20).transpose()
>>> import matplotlib.pyplot as plt
>>> plt.barh(range(20), s[0])
>>> plt.barh(range(20), s[1], left=s[0], color='g')
>>> plt.barh(range(20), s[2], left=s[0]+s[1], color='r')
>>> plt.title("Lengths of Strings")
exponential(scale=1.0, size=None)

Draw samples from an exponential distribution.

Its probability density function is

\[f(x; \frac{1}{\beta}) = \frac{1}{\beta} \exp(-\frac{x}{\beta}),\]

for x > 0 and 0 elsewhere. \(\beta\) is the scale parameter, which is the inverse of the rate parameter \(\lambda = 1/\beta\). The rate parameter is an alternative, widely used parameterization of the exponential distribution [3]_.

The exponential distribution is a continuous analogue of the geometric distribution. It describes many common situations, such as the size of raindrops measured over many rainstorms [1]_, or the time between page requests to Wikipedia [2]_.

Parameters:
  • scale (float or array_like of floats) – The scale parameter, \(\beta = 1/\lambda\). Must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if scale is a scalar. Otherwise, np.array(scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized exponential distribution.

Return type:

ndarray or scalar

Examples

Assume a company has 10000 customer support agents and the time between customer calls is exponentially distributed and that the average time between customer calls is 4 minutes.

>>> scale, size = 4, 10000
>>> rng = np.random.default_rng()
>>> time_between_calls = rng.exponential(scale=scale, size=size)

What is the probability that a customer will call in the next 4 to 5 minutes?

>>> x = ((time_between_calls < 5).sum())/size
>>> y = ((time_between_calls < 4).sum())/size
>>> x - y
0.08  # may vary

The corresponding distribution can be visualized as follows:

>>> import matplotlib.pyplot as plt
>>> scale, size = 4, 10000
>>> rng = np.random.default_rng()
>>> sample = rng.exponential(scale=scale, size=size)
>>> count, bins, _ = plt.hist(sample, 30, density=True)
>>> plt.plot(bins, scale**(-1)*np.exp(-scale**-1*bins), linewidth=2, color='r')
>>> plt.show()

References

f(dfnum, dfden, size=None)

Draw samples from an F distribution.

Samples are drawn from an F distribution with specified parameters, dfnum (degrees of freedom in numerator) and dfden (degrees of freedom in denominator), where both parameters must be greater than zero.

The random variate of the F distribution (also known as the Fisher distribution) is a continuous probability distribution that arises in ANOVA tests, and is the ratio of two chi-square variates.

Parameters:
  • dfnum (float or array_like of floats) – Degrees of freedom in numerator, must be > 0.

  • dfden (float or array_like of float) – Degrees of freedom in denominator, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if dfnum and dfden are both scalars. Otherwise, np.broadcast(dfnum, dfden).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Fisher distribution.

Return type:

ndarray or scalar

See also

scipy.stats.f

probability density function, distribution or cumulative density function, etc.

Notes

The F statistic is used to compare in-group variances to between-group variances. Calculating the distribution depends on the sampling, and so it is a function of the respective degrees of freedom in the problem. The variable dfnum is the number of samples minus one, the between-groups degrees of freedom, while dfden is the within-groups degrees of freedom, the sum of the number of samples in each group minus the number of groups.

References

Examples

An example from Glantz[1], pp 47-40:

Two groups, children of diabetics (25 people) and children from people without diabetes (25 controls). Fasting blood glucose was measured, case group had a mean value of 86.1, controls had a mean value of 82.2. Standard deviations were 2.09 and 2.49 respectively. Are these data consistent with the null hypothesis that the parents diabetic status does not affect their children’s blood glucose levels? Calculating the F statistic from the data gives a value of 36.01.

Draw samples from the distribution:

>>> dfnum = 1. # between group degrees of freedom
>>> dfden = 48. # within groups degrees of freedom
>>> rng = np.random.default_rng()
>>> s = rng.f(dfnum, dfden, 1000)

The lower bound for the top 1% of the samples is :

>>> np.sort(s)[-10]
7.61988120985 # random

So there is about a 1% chance that the F statistic will exceed 7.62, the measured value is 36, so the null hypothesis is rejected at the 1% level.

The corresponding probability density function for n = 20 and m = 20 is:

>>> import matplotlib.pyplot as plt
>>> from scipy import stats
>>> dfnum, dfden, size = 20, 20, 10000
>>> s = rng.f(dfnum=dfnum, dfden=dfden, size=size)
>>> bins, density, _ = plt.hist(s, 30, density=True)
>>> x = np.linspace(0, 5, 1000)
>>> plt.plot(x, stats.f.pdf(x, dfnum, dfden))
>>> plt.xlim([0, 5])
>>> plt.show()
gamma(shape, scale=1.0, size=None)

Draw samples from a Gamma distribution.

Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated “k”) and scale (sometimes designated “theta”), where both parameters are > 0.

Parameters:
  • shape (float or array_like of floats) – The shape of the gamma distribution. Must be non-negative.

  • scale (float or array_like of floats, optional) – The scale of the gamma distribution. Must be non-negative. Default is equal to 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if shape and scale are both scalars. Otherwise, np.broadcast(shape, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized gamma distribution.

Return type:

ndarray or scalar

See also

scipy.stats.gamma

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gamma distribution is

\[p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},\]

where \(k\) is the shape and \(\theta\) the scale, and \(\Gamma\) is the Gamma function.

The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

References

Examples

Draw samples from the distribution:

>>> shape, scale = 2., 2.  # mean=4, std=2*sqrt(2)
>>> rng = np.random.default_rng()
>>> s = rng.gamma(shape, scale, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> import scipy.special as sps
>>> count, bins, _ = plt.hist(s, 50, density=True)
>>> y = bins**(shape-1)*(np.exp(-bins/scale) /
...                      (sps.gamma(shape)*scale**shape))
>>> plt.plot(bins, y, linewidth=2, color='r')
>>> plt.show()
geometric(p, size=None)

Draw samples from the geometric distribution.

Bernoulli trials are experiments with one of two outcomes: success or failure (an example of such an experiment is flipping a coin). The geometric distribution models the number of trials that must be run in order to achieve success. It is therefore supported on the positive integers, k = 1, 2, ....

The probability mass function of the geometric distribution is

\[f(k) = (1 - p)^{k - 1} p\]

where p is the probability of success of an individual trial.

Parameters:
  • p (float or array_like of floats) – The probability of success of an individual trial.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if p is a scalar. Otherwise, np.array(p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized geometric distribution.

Return type:

ndarray or scalar

References

Examples

Draw 10,000 values from the geometric distribution, with the probability of an individual success equal to p = 0.35:

>>> p, size = 0.35, 10000
>>> rng = np.random.default_rng()
>>> sample = rng.geometric(p=p, size=size)

What proportion of trials succeeded after a single run?

>>> (sample == 1).sum()/size
0.34889999999999999  # may vary

The geometric distribution with p=0.35 looks as follows:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(sample, bins=30, density=True)
>>> plt.plot(bins, (1-p)**(bins-1)*p)
>>> plt.xlim([0, 25])
>>> plt.show()
gumbel(loc=0.0, scale=1.0, size=None)

Draw samples from a Gumbel distribution.

Draw samples from a Gumbel distribution with specified location and scale. For more information on the Gumbel distribution, see Notes and References below.

Parameters:
  • loc (float or array_like of floats, optional) – The location of the mode of the distribution. Default is 0.

  • scale (float or array_like of floats, optional) – The scale parameter of the distribution. Default is 1. Must be non- negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Gumbel distribution.

Return type:

ndarray or scalar

See also

scipy.stats.gumbel_l, scipy.stats.gumbel_r, scipy.stats.genextreme, weibull

Notes

The Gumbel (or Smallest Extreme Value (SEV) or the Smallest Extreme Value Type I) distribution is one of a class of Generalized Extreme Value (GEV) distributions used in modeling extreme value problems. The Gumbel is a special case of the Extreme Value Type I distribution for maximums from distributions with “exponential-like” tails.

The probability density for the Gumbel distribution is

\[p(x) = \frac{e^{-(x - \mu)/ \beta}}{\beta} e^{ -e^{-(x - \mu)/ \beta}},\]

where \(\mu\) is the mode, a location parameter, and \(\beta\) is the scale parameter.

The Gumbel (named for German mathematician Emil Julius Gumbel) was used very early in the hydrology literature, for modeling the occurrence of flood events. It is also used for modeling maximum wind speed and rainfall rates. It is a “fat-tailed” distribution - the probability of an event in the tail of the distribution is larger than if one used a Gaussian, hence the surprisingly frequent occurrence of 100-year floods. Floods were initially modeled as a Gaussian process, which underestimated the frequency of extreme events.

It is one of a class of extreme value distributions, the Generalized Extreme Value (GEV) distributions, which also includes the Weibull and Frechet.

The function has a mean of \(\mu + 0.57721\beta\) and a variance of \(\frac{\pi^2}{6}\beta^2\).

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> mu, beta = 0, 0.1 # location and scale
>>> s = rng.gumbel(mu, beta, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> plt.plot(bins, (1/beta)*np.exp(-(bins - mu)/beta)
...          * np.exp( -np.exp( -(bins - mu) /beta) ),
...          linewidth=2, color='r')
>>> plt.show()

Show how an extreme value distribution can arise from a Gaussian process and compare to a Gaussian:

>>> means = []
>>> maxima = []
>>> for i in range(0,1000) :
...    a = rng.normal(mu, beta, 1000)
...    means.append(a.mean())
...    maxima.append(a.max())
>>> count, bins, _ = plt.hist(maxima, 30, density=True)
>>> beta = np.std(maxima) * np.sqrt(6) / np.pi
>>> mu = np.mean(maxima) - 0.57721*beta
>>> plt.plot(bins, (1/beta)*np.exp(-(bins - mu)/beta)
...          * np.exp(-np.exp(-(bins - mu)/beta)),
...          linewidth=2, color='r')
>>> plt.plot(bins, 1/(beta * np.sqrt(2 * np.pi))
...          * np.exp(-(bins - mu)**2 / (2 * beta**2)),
...          linewidth=2, color='g')
>>> plt.show()
hypergeometric(ngood, nbad, nsample, size=None)

Draw samples from a Hypergeometric distribution.

Samples are drawn from a hypergeometric distribution with specified parameters, ngood (ways to make a good selection), nbad (ways to make a bad selection), and nsample (number of items sampled, which is less than or equal to the sum ngood + nbad).

Parameters:
  • ngood (int or array_like of ints) – Number of ways to make a good selection. Must be nonnegative and less than 10**9.

  • nbad (int or array_like of ints) – Number of ways to make a bad selection. Must be nonnegative and less than 10**9.

  • nsample (int or array_like of ints) – Number of items sampled. Must be nonnegative and less than ngood + nbad.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if ngood, nbad, and nsample are all scalars. Otherwise, np.broadcast(ngood, nbad, nsample).size samples are drawn.

Returns:

out – Drawn samples from the parameterized hypergeometric distribution. Each sample is the number of good items within a randomly selected subset of size nsample taken from a set of ngood good items and nbad bad items.

Return type:

ndarray or scalar

See also

multivariate_hypergeometric

Draw samples from the multivariate hypergeometric distribution.

scipy.stats.hypergeom

probability density function, distribution or cumulative density function, etc.

Notes

The probability mass function (PMF) for the Hypergeometric distribution is

\[P(x) = \frac{\binom{g}{x}\binom{b}{n-x}}{\binom{g+b}{n}},\]

where \(0 \le x \le n\) and \(n-b \le x \le g\)

for P(x) the probability of x good results in the drawn sample, g = ngood, b = nbad, and n = nsample.

Consider an urn with black and white marbles in it, ngood of them are black and nbad are white. If you draw nsample balls without replacement, then the hypergeometric distribution describes the distribution of black balls in the drawn sample.

Note that this distribution is very similar to the binomial distribution, except that in this case, samples are drawn without replacement, whereas in the Binomial case samples are drawn with replacement (or the sample space is infinite). As the sample space becomes large, this distribution approaches the binomial.

The arguments ngood and nbad each must be less than 10**9. For extremely large arguments, the algorithm that is used to compute the samples [4]_ breaks down because of loss of precision in floating point calculations. For such large values, if nsample is not also large, the distribution can be approximated with the binomial distribution, binomial(n=nsample, p=ngood/(ngood + nbad)).

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> ngood, nbad, nsamp = 100, 2, 10
# number of good, number of bad, and number of samples
>>> s = rng.hypergeometric(ngood, nbad, nsamp, 1000)
>>> from matplotlib.pyplot import hist
>>> hist(s)
#   note that it is very unlikely to grab both bad items

Suppose you have an urn with 15 white and 15 black marbles. If you pull 15 marbles at random, how likely is it that 12 or more of them are one color?

>>> s = rng.hypergeometric(15, 15, 15, 100000)
>>> sum(s>=12)/100000. + sum(s<=3)/100000.
#   answer = 0.003 ... pretty unlikely!
integers(low, high=None, size=None, dtype=np.int64, endpoint=False)

Return random integers from low (inclusive) to high (exclusive), or if endpoint=True, low (inclusive) to high (inclusive). Replaces RandomState.randint (with endpoint=False) and RandomState.random_integers (with endpoint=True)

Return random integers from the “discrete uniform” distribution of the specified dtype. If high is None (the default), then results are from 0 to low.

Parameters:
  • low (int or array-like of ints) – Lowest (signed) integers to be drawn from the distribution (unless high=None, in which case this parameter is 0 and this value is used for high).

  • high (int or array-like of ints, optional) – If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None). If array-like, must contain integer values

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result. Byteorder must be native. The default value is np.int64.

  • endpoint (bool, optional) – If true, sample from the interval [low, high] instead of the default [low, high) Defaults to False

Returns:

outsize-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.

Return type:

int or ndarray of ints

Notes

When using broadcasting with uint64 dtypes, the maximum value (2**64) cannot be represented as a standard integer type. The high array (or low if high is None) must have object dtype, e.g., array([2**64]).

Examples

>>> rng = np.random.default_rng()
>>> rng.integers(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])  # random
>>> rng.integers(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Generate a 2 x 4 array of ints between 0 and 4, inclusive:

>>> rng.integers(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])  # random

Generate a 1 x 3 array with 3 different upper bounds

>>> rng.integers(1, [3, 5, 10])
array([2, 2, 9])  # random

Generate a 1 by 3 array with 3 different lower bounds

>>> rng.integers([1, 5, 7], 10)
array([9, 8, 7])  # random

Generate a 2 by 4 array using broadcasting with dtype of uint8

>>> rng.integers([1, 3, 5, 7], [[10], [20]], dtype=np.uint8)
array([[ 8,  6,  9,  7],
       [ 1, 16,  9, 12]], dtype=uint8)  # random

References

laplace(loc=0.0, scale=1.0, size=None)

Draw samples from the Laplace or double exponential distribution with specified location (or mean) and scale (decay).

The Laplace distribution is similar to the Gaussian/normal distribution, but is sharper at the peak and has fatter tails. It represents the difference between two independent, identically distributed exponential random variables.

Parameters:
  • loc (float or array_like of floats, optional) – The position, \(\mu\), of the distribution peak. Default is 0.

  • scale (float or array_like of floats, optional) – \(\lambda\), the exponential decay. Default is 1. Must be non- negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Laplace distribution.

Return type:

ndarray or scalar

Notes

It has the probability density function

\[f(x; \mu, \lambda) = \frac{1}{2\lambda} \exp\left(-\frac{|x - \mu|}{\lambda}\right).\]

The first law of Laplace, from 1774, states that the frequency of an error can be expressed as an exponential function of the absolute magnitude of the error, which leads to the Laplace distribution. For many problems in economics and health sciences, this distribution seems to model the data better than the standard Gaussian distribution.

References

Examples

Draw samples from the distribution

>>> loc, scale = 0., 1.
>>> rng = np.random.default_rng()
>>> s = rng.laplace(loc, scale, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> x = np.arange(-8., 8., .01)
>>> pdf = np.exp(-abs(x-loc)/scale)/(2.*scale)
>>> plt.plot(x, pdf)

Plot Gaussian for comparison:

>>> g = (1/(scale * np.sqrt(2 * np.pi)) *
...      np.exp(-(x - loc)**2 / (2 * scale**2)))
>>> plt.plot(x,g)
logistic(loc=0.0, scale=1.0, size=None)

Draw samples from a logistic distribution.

Samples are drawn from a logistic distribution with specified parameters, loc (location or mean, also median), and scale (>0).

Parameters:
  • loc (float or array_like of floats, optional) – Parameter of the distribution. Default is 0.

  • scale (float or array_like of floats, optional) – Parameter of the distribution. Must be non-negative. Default is 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized logistic distribution.

Return type:

ndarray or scalar

See also

scipy.stats.logistic

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Logistic distribution is

\[P(x) = \frac{e^{-(x-\mu)/s}}{s(1+e^{-(x-\mu)/s})^2},\]

where \(\mu\) = location and \(s\) = scale.

The Logistic distribution is used in Extreme Value problems where it can act as a mixture of Gumbel distributions, in Epidemiology, and by the World Chess Federation (FIDE) where it is used in the Elo ranking system, assuming the performance of each player is a logistically distributed random variable.

References

Examples

Draw samples from the distribution:

>>> loc, scale = 10, 1
>>> rng = np.random.default_rng()
>>> s = rng.logistic(loc, scale, 10000)
>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, bins=50, label='Sampled data')

# plot sampled data against the exact distribution

>>> def logistic(x, loc, scale):
...     return np.exp((loc-x)/scale)/(scale*(1+np.exp((loc-x)/scale))**2)
>>> logistic_values  = logistic(bins, loc, scale)
>>> bin_spacing = np.mean(np.diff(bins))
>>> plt.plot(bins, logistic_values  * bin_spacing * s.size, label='Logistic PDF')
>>> plt.legend()
>>> plt.show()
lognormal(mean=0.0, sigma=1.0, size=None)

Draw samples from a log-normal distribution.

Draw samples from a log-normal distribution with specified mean, standard deviation, and array shape. Note that the mean and standard deviation are not the values for the distribution itself, but of the underlying normal distribution it is derived from.

Parameters:
  • mean (float or array_like of floats, optional) – Mean value of the underlying normal distribution. Default is 0.

  • sigma (float or array_like of floats, optional) – Standard deviation of the underlying normal distribution. Must be non-negative. Default is 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if mean and sigma are both scalars. Otherwise, np.broadcast(mean, sigma).size samples are drawn.

Returns:

out – Drawn samples from the parameterized log-normal distribution.

Return type:

ndarray or scalar

See also

scipy.stats.lognorm

probability density function, distribution, cumulative density function, etc.

Notes

A variable x has a log-normal distribution if log(x) is normally distributed. The probability density function for the log-normal distribution is:

\[p(x) = \frac{1}{\sigma x \sqrt{2\pi}} e^{(-\frac{(ln(x)-\mu)^2}{2\sigma^2})}\]

where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the normally distributed logarithm of the variable. A log-normal distribution results if a random variable is the product of a large number of independent, identically-distributed variables in the same way that a normal distribution results if the variable is the sum of a large number of independent, identically-distributed variables.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> mu, sigma = 3., 1. # mean and standard deviation
>>> s = rng.lognormal(mu, sigma, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 100, density=True, align='mid')
>>> x = np.linspace(min(bins), max(bins), 10000)
>>> pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))
...        / (x * sigma * np.sqrt(2 * np.pi)))
>>> plt.plot(x, pdf, linewidth=2, color='r')
>>> plt.axis('tight')
>>> plt.show()

Demonstrate that taking the products of random samples from a uniform distribution can be fit well by a log-normal probability density function.

>>> # Generate a thousand samples: each is the product of 100 random
>>> # values, drawn from a normal distribution.
>>> rng = rng
>>> b = []
>>> for i in range(1000):
...    a = 10. + rng.standard_normal(100)
...    b.append(np.prod(a))
>>> b = np.array(b) / np.min(b) # scale values to be positive
>>> count, bins, _ = plt.hist(b, 100, density=True, align='mid')
>>> sigma = np.std(np.log(b))
>>> mu = np.mean(np.log(b))
>>> x = np.linspace(min(bins), max(bins), 10000)
>>> pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))
...        / (x * sigma * np.sqrt(2 * np.pi)))
>>> plt.plot(x, pdf, color='r', linewidth=2)
>>> plt.show()
logseries(p, size=None)

Draw samples from a logarithmic series distribution.

Samples are drawn from a log series distribution with specified shape parameter, 0 <= p < 1.

Parameters:
  • p (float or array_like of floats) – Shape parameter for the distribution. Must be in the range [0, 1).

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if p is a scalar. Otherwise, np.array(p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized logarithmic series distribution.

Return type:

ndarray or scalar

See also

scipy.stats.logser

probability density function, distribution or cumulative density function, etc.

Notes

The probability mass function for the Log Series distribution is

\[P(k) = \frac{-p^k}{k \ln(1-p)},\]

where p = probability.

The log series distribution is frequently used to represent species richness and occurrence, first proposed by Fisher, Corbet, and Williams in 1943 [2]. It may also be used to model the numbers of occupants seen in cars [3].

References

Examples

Draw samples from the distribution:

>>> a = .6
>>> rng = np.random.default_rng()
>>> s = rng.logseries(a, 10000)
>>> import matplotlib.pyplot as plt
>>> bins = np.arange(-.5, max(s) + .5 )
>>> count, bins, _ = plt.hist(s, bins=bins, label='Sample count')

# plot against distribution

>>> def logseries(k, p):
...     return -p**k/(k*np.log(1-p))
>>> centres = np.arange(1, max(s) + 1)
>>> plt.plot(centres, logseries(centres, a) * s.size, 'r', label='logseries PMF')
>>> plt.legend()
>>> plt.show()
multinomial(n, pvals, size=None)

Draw samples from a multinomial distribution.

The multinomial distribution is a multivariate generalization of the binomial distribution. Take an experiment with one of p possible outcomes. An example of such an experiment is throwing a dice, where the outcome can be 1 through 6. Each sample drawn from the distribution represents n such experiments. Its values, X_i = [X_0, X_1, ..., X_p], represent the number of times the outcome was i.

Parameters:
  • n (int or array-like of ints) – Number of experiments.

  • pvals (array-like of floats) – Probabilities of each of the p different outcomes with shape (k0, k1, ..., kn, p). Each element pvals[i,j,...,:] must sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[..., :-1], axis=-1) <= 1.0. Must have at least 1 dimension where pvals.shape[-1] > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn each with p elements. Default is None where the output size is determined by the broadcast shape of n and all by the final dimension of pvals, which is denoted as b=(b0, b1, ..., bq). If size is not None, then it must be compatible with the broadcast shape b. Specifically, size must have q or more elements and size[-(q-j):] must equal bj.

Returns:

out – The drawn samples, of shape size, if provided. When size is provided, the output shape is size + (p,) If not specified, the shape is determined by the broadcast shape of n and pvals, (b0, b1, ..., bq) augmented with the dimension of the multinomial, p, so that that output shape is (b0, b1, ..., bq, p).

Each entry out[i,j,...,:] is a p-dimensional value drawn from the distribution.

Return type:

ndarray

Examples

Throw a dice 20 times:

>>> rng = np.random.default_rng()
>>> rng.multinomial(20, [1/6.]*6, size=1)
array([[4, 1, 7, 5, 2, 1]])  # random

It landed 4 times on 1, once on 2, etc.

Now, throw the dice 20 times, and 20 times again:

>>> rng.multinomial(20, [1/6.]*6, size=2)
array([[3, 4, 3, 3, 4, 3],
       [2, 4, 3, 4, 0, 7]])  # random

For the first run, we threw 3 times 1, 4 times 2, etc. For the second, we threw 2 times 1, 4 times 2, etc.

Now, do one experiment throwing the dice 10 time, and 10 times again, and another throwing the dice 20 times, and 20 times again:

>>> rng.multinomial([[10], [20]], [1/6.]*6, size=(2, 2))
array([[[2, 4, 0, 1, 2, 1],
        [1, 3, 0, 3, 1, 2]],
       [[1, 4, 4, 4, 4, 3],
        [3, 3, 2, 5, 5, 2]]])  # random

The first array shows the outcomes of throwing the dice 10 times, and the second shows the outcomes from throwing the dice 20 times.

A loaded die is more likely to land on number 6:

>>> rng.multinomial(100, [1/7.]*5 + [2/7.])
array([11, 16, 14, 17, 16, 26])  # random

Simulate 10 throws of a 4-sided die and 20 throws of a 6-sided die

>>> rng.multinomial([10, 20],[[1/4]*4 + [0]*2, [1/6]*6])
array([[2, 1, 4, 3, 0, 0],
       [3, 3, 3, 6, 1, 4]], dtype=int64)  # random

Generate categorical random variates from two categories where the first has 3 outcomes and the second has 2.

>>> rng.multinomial(1, [[.1, .5, .4 ], [.3, .7, .0]])
array([[0, 0, 1],
       [0, 1, 0]], dtype=int64)  # random

argmax(axis=-1) is then used to return the categories.

>>> pvals = [[.1, .5, .4 ], [.3, .7, .0]]
>>> rvs = rng.multinomial(1, pvals, size=(4,2))
>>> rvs.argmax(axis=-1)
array([[0, 1],
       [2, 0],
       [2, 1],
       [2, 0]], dtype=int64)  # random

The same output dimension can be produced using broadcasting.

>>> rvs = rng.multinomial([[1]] * 4, pvals)
>>> rvs.argmax(axis=-1)
array([[0, 1],
       [2, 0],
       [2, 1],
       [2, 0]], dtype=int64)  # random

The probability inputs should be normalized. As an implementation detail, the value of the last entry is ignored and assumed to take up any leftover probability mass, but this should not be relied on. A biased coin which has twice as much weight on one side as on the other should be sampled like so:

>>> rng.multinomial(100, [1.0 / 3, 2.0 / 3])  # RIGHT
array([38, 62])  # random

not like:

>>> rng.multinomial(100, [1.0, 2.0])  # WRONG
Traceback (most recent call last):
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs
multivariate_hypergeometric(colors, nsample, size=None, method='marginals')
multivariate_hypergeometric(colors, nsample, size=None,

method=’marginals’)

Generate variates from a multivariate hypergeometric distribution.

The multivariate hypergeometric distribution is a generalization of the hypergeometric distribution.

Choose nsample items at random without replacement from a collection with N distinct types. N is the length of colors, and the values in colors are the number of occurrences of that type in the collection. The total number of items in the collection is sum(colors). Each random variate generated by this function is a vector of length N holding the counts of the different types that occurred in the nsample items.

The name colors comes from a common description of the distribution: it is the probability distribution of the number of marbles of each color selected without replacement from an urn containing marbles of different colors; colors[i] is the number of marbles in the urn with color i.

Parameters:
  • colors (sequence of integers) – The number of each type of item in the collection from which a sample is drawn. The values in colors must be nonnegative. To avoid loss of precision in the algorithm, sum(colors) must be less than 10**9 when method is “marginals”.

  • nsample (int) – The number of items selected. nsample must not be greater than sum(colors).

  • size (int or tuple of ints, optional) – The number of variates to generate, either an integer or a tuple holding the shape of the array of variates. If the given size is, e.g., (k, m), then k * m variates are drawn, where one variate is a vector of length len(colors), and the return value has shape (k, m, len(colors)). If size is an integer, the output has shape (size, len(colors)). Default is None, in which case a single variate is returned as an array with shape (len(colors),).

  • method (string, optional) – Specify the algorithm that is used to generate the variates. Must be ‘count’ or ‘marginals’ (the default). See the Notes for a description of the methods.

Returns:

variates – Array of variates drawn from the multivariate hypergeometric distribution.

Return type:

ndarray

See also

hypergeometric

Draw samples from the (univariate) hypergeometric distribution.

Notes

The two methods do not return the same sequence of variates.

The “count” algorithm is roughly equivalent to the following numpy code:

choices = np.repeat(np.arange(len(colors)), colors)
selection = np.random.choice(choices, nsample, replace=False)
variate = np.bincount(selection, minlength=len(colors))

The “count” algorithm uses a temporary array of integers with length sum(colors).

The “marginals” algorithm generates a variate by using repeated calls to the univariate hypergeometric sampler. It is roughly equivalent to:

variate = np.zeros(len(colors), dtype=np.int64)
# `remaining` is the cumulative sum of `colors` from the last
# element to the first; e.g. if `colors` is [3, 1, 5], then
# `remaining` is [9, 6, 5].
remaining = np.cumsum(colors[::-1])[::-1]
for i in range(len(colors)-1):
    if nsample < 1:
        break
    variate[i] = hypergeometric(colors[i], remaining[i+1],
                               nsample)
    nsample -= variate[i]
variate[-1] = nsample

The default method is “marginals”. For some cases (e.g. when colors contains relatively small integers), the “count” method can be significantly faster than the “marginals” method. If performance of the algorithm is important, test the two methods with typical inputs to decide which works best.

Examples

>>> colors = [16, 8, 4]
>>> seed = 4861946401452
>>> gen = np.random.Generator(np.random.PCG64(seed))
>>> gen.multivariate_hypergeometric(colors, 6)
array([5, 0, 1])
>>> gen.multivariate_hypergeometric(colors, 6, size=3)
array([[5, 0, 1],
       [2, 2, 2],
       [3, 3, 0]])
>>> gen.multivariate_hypergeometric(colors, 6, size=(2, 2))
array([[[3, 2, 1],
        [3, 2, 1]],
       [[4, 1, 1],
        [3, 2, 1]]])
multivariate_normal(mean, cov, size=None, check_valid='warn', tol=1e-08, *, method='svd')
multivariate_normal(mean, cov, size=None, check_valid=’warn’,

tol=1e-8, *, method=’svd’)

Draw random samples from a multivariate normal distribution.

The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (the squared standard deviation, or “width”) of the one-dimensional normal distribution.

Parameters:
  • mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.

  • cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.

  • size (int or tuple of ints, optional) – Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned.

  • check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.

  • tol (float, optional) – Tolerance when checking the singular values in covariance matrix. cov is cast to double before the check.

  • method ({ 'svd', 'eigh', 'cholesky'}, optional) – The cov input is used to compute a factor matrix A such that A @ A.T = cov. This argument is used to select the method used to compute the factor matrix A. The default method ‘svd’ is the slowest, while ‘cholesky’ is the fastest but less robust than the slowest method. The method eigh uses eigen decomposition to compute A and is faster than svd but slower than cholesky.

Returns:

out – The drawn samples, of shape size, if that was provided. If not, the shape is (N,).

In other words, each entry out[i,j,...,:] is an N-dimensional value drawn from the distribution.

Return type:

ndarray

Notes

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.

Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, ... x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (cov is a multiple of the identity matrix)

  • Diagonal covariance (cov has non-negative elements, and only on the diagonal)

This geometrical property can be seen in two dimensions by plotting generated data-points:

>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]]  # diagonal covariance

Diagonal covariance means that points are oriented along x or y-axis:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> x, y = rng.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, 'x')
>>> plt.axis('equal')
>>> plt.show()

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

This function internally uses linear algebra routines, and thus results may not be identical (even up to precision) across architectures, OSes, or even builds. For example, this is likely if cov has multiple equal singular values and method is 'svd' (default). In this case, method='cholesky' may be more robust.

References

Examples

>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> rng = np.random.default_rng()
>>> x = rng.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)

We can use a different method other than the default to factorize cov:

>>> y = rng.multivariate_normal(mean, cov, (3, 3), method='cholesky')
>>> y.shape
(3, 3, 2)

Here we generate 800 samples from the bivariate normal distribution with mean [0, 0] and covariance matrix [[6, -3], [-3, 3.5]]. The expected variances of the first and second components of the sample are 6 and 3.5, respectively, and the expected correlation coefficient is -3/sqrt(6*3.5) ≈ -0.65465.

>>> cov = np.array([[6, -3], [-3, 3.5]])
>>> pts = rng.multivariate_normal([0, 0], cov, size=800)

Check that the mean, covariance, and correlation coefficient of the sample are close to the expected values:

>>> pts.mean(axis=0)
array([ 0.0326911 , -0.01280782])  # may vary
>>> np.cov(pts.T)
array([[ 5.96202397, -2.85602287],
       [-2.85602287,  3.47613949]])  # may vary
>>> np.corrcoef(pts.T)[0, 1]
-0.6273591314603949  # may vary

We can visualize this data with a scatter plot. The orientation of the point cloud illustrates the negative correlation of the components of this sample.

>>> import matplotlib.pyplot as plt
>>> plt.plot(pts[:, 0], pts[:, 1], '.', alpha=0.5)
>>> plt.axis('equal')
>>> plt.grid()
>>> plt.show()
negative_binomial(n, p, size=None)

Draw samples from a negative binomial distribution.

Samples are drawn from a negative binomial distribution with specified parameters, n successes and p probability of success where n is > 0 and p is in the interval (0, 1].

Parameters:
  • n (float or array_like of floats) – Parameter of the distribution, > 0.

  • p (float or array_like of floats) – Parameter of the distribution. Must satisfy 0 < p <= 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if n and p are both scalars. Otherwise, np.broadcast(n, p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized negative binomial distribution, where each sample is equal to N, the number of failures that occurred before a total of n successes was reached.

Return type:

ndarray or scalar

Notes

The probability mass function of the negative binomial distribution is

\[P(N;n,p) = \frac{\Gamma(N+n)}{N!\Gamma(n)}p^{n}(1-p)^{N},\]

where \(n\) is the number of successes, \(p\) is the probability of success, \(N+n\) is the number of trials, and \(\Gamma\) is the gamma function. When \(n\) is an integer, \(\frac{\Gamma(N+n)}{N!\Gamma(n)} = \binom{N+n-1}{N}\), which is the more common form of this term in the pmf. The negative binomial distribution gives the probability of N failures given n successes, with a success on the last trial.

If one throws a die repeatedly until the third time a “1” appears, then the probability distribution of the number of non-“1”s that appear before the third “1” is a negative binomial distribution.

Because this method internally calls Generator.poisson with an intermediate random value, a ValueError is raised when the choice of \(n\) and \(p\) would result in the mean + 10 sigma of the sampled intermediate distribution exceeding the max acceptable value of the Generator.poisson method. This happens when \(p\) is too low (a lot of failures happen for every success) and \(n\) is too big ( a lot of successes are allowed). Therefore, the \(n\) and \(p\) values must satisfy the constraint:

\[n\frac{1-p}{p}+10n\sqrt{n}\frac{1-p}{p}<2^{63}-1-10\sqrt{2^{63}-1},\]

Where the left side of the equation is the derived mean + 10 sigma of a sample from the gamma distribution internally used as the \(lam\) parameter of a poisson sample, and the right side of the equation is the constraint for maximum value of \(lam\) in Generator.poisson.

References

Examples

Draw samples from the distribution:

A real world example. A company drills wild-cat oil exploration wells, each with an estimated probability of success of 0.1. What is the probability of having one success for each successive well, that is what is the probability of a single success after drilling 5 wells, after 6 wells, etc.?

>>> rng = np.random.default_rng()
>>> s = rng.negative_binomial(1, 0.1, 100000)
>>> for i in range(1, 11):
...    probability = sum(s<i) / 100000.
...    print(i, "wells drilled, probability of one success =", probability)
noncentral_chisquare(df, nonc, size=None)

Draw samples from a noncentral chi-square distribution.

The noncentral \(\chi^2\) distribution is a generalization of the \(\chi^2\) distribution.

Parameters:
  • df (float or array_like of floats) – Degrees of freedom, must be > 0.

  • nonc (float or array_like of floats) – Non-centrality, must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if df and nonc are both scalars. Otherwise, np.broadcast(df, nonc).size samples are drawn.

Returns:

out – Drawn samples from the parameterized noncentral chi-square distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the noncentral Chi-square distribution is

\[P(x;df,nonc) = \sum^{\infty}_{i=0} \frac{e^{-nonc/2}(nonc/2)^{i}}{i!} P_{Y_{df+2i}}(x),\]

where \(Y_{q}\) is the Chi-square with q degrees of freedom.

References

Examples

Draw values from the distribution and plot the histogram

>>> rng = np.random.default_rng()
>>> import matplotlib.pyplot as plt
>>> values = plt.hist(rng.noncentral_chisquare(3, 20, 100000),
...                   bins=200, density=True)
>>> plt.show()

Draw values from a noncentral chisquare with very small noncentrality, and compare to a chisquare.

>>> plt.figure()
>>> values = plt.hist(rng.noncentral_chisquare(3, .0000001, 100000),
...                   bins=np.arange(0., 25, .1), density=True)
>>> values2 = plt.hist(rng.chisquare(3, 100000),
...                    bins=np.arange(0., 25, .1), density=True)
>>> plt.plot(values[1][0:-1], values[0]-values2[0], 'ob')
>>> plt.show()

Demonstrate how large values of non-centrality lead to a more symmetric distribution.

>>> plt.figure()
>>> values = plt.hist(rng.noncentral_chisquare(3, 20, 100000),
...                   bins=200, density=True)
>>> plt.show()
noncentral_f(dfnum, dfden, nonc, size=None)

Draw samples from the noncentral F distribution.

Samples are drawn from an F distribution with specified parameters, dfnum (degrees of freedom in numerator) and dfden (degrees of freedom in denominator), where both parameters > 1. nonc is the non-centrality parameter.

Parameters:
  • dfnum (float or array_like of floats) – Numerator degrees of freedom, must be > 0.

  • dfden (float or array_like of floats) – Denominator degrees of freedom, must be > 0.

  • nonc (float or array_like of floats) – Non-centrality parameter, the sum of the squares of the numerator means, must be >= 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if dfnum, dfden, and nonc are all scalars. Otherwise, np.broadcast(dfnum, dfden, nonc).size samples are drawn.

Returns:

out – Drawn samples from the parameterized noncentral Fisher distribution.

Return type:

ndarray or scalar

Notes

When calculating the power of an experiment (power = probability of rejecting the null hypothesis when a specific alternative is true) the non-central F statistic becomes important. When the null hypothesis is true, the F statistic follows a central F distribution. When the null hypothesis is not true, then it follows a non-central F statistic.

References

Examples

In a study, testing for a specific alternative to the null hypothesis requires use of the Noncentral F distribution. We need to calculate the area in the tail of the distribution that exceeds the value of the F distribution for the null hypothesis. We’ll plot the two probability distributions for comparison.

>>> rng = np.random.default_rng()
>>> dfnum = 3 # between group deg of freedom
>>> dfden = 20 # within groups degrees of freedom
>>> nonc = 3.0
>>> nc_vals = rng.noncentral_f(dfnum, dfden, nonc, 1000000)
>>> NF = np.histogram(nc_vals, bins=50, density=True)
>>> c_vals = rng.f(dfnum, dfden, 1000000)
>>> F = np.histogram(c_vals, bins=50, density=True)
>>> import matplotlib.pyplot as plt
>>> plt.plot(F[1][1:], F[0])
>>> plt.plot(NF[1][1:], NF[0])
>>> plt.show()
normal(loc=0.0, scale=1.0, size=None)

Draw random samples from a normal (Gaussian) distribution.

The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently [2]_, is often called the bell curve because of its characteristic shape (see the example below).

The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution [2]_.

Parameters:
  • loc (float or array_like of floats) – Mean (“centre”) of the distribution.

  • scale (float or array_like of floats) – Standard deviation (spread or “width”) of the distribution. Must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized normal distribution.

Return type:

ndarray or scalar

See also

scipy.stats.norm

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gaussian distribution is

\[p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }} e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },\]

where \(\mu\) is the mean and \(\sigma\) the standard deviation. The square of the standard deviation, \(\sigma^2\), is called the variance.

The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at \(x + \sigma\) and \(x - \sigma\) [2]_). This implies that normal() is more likely to return samples lying close to the mean, rather than those far away.

References

Examples

Draw samples from the distribution:

>>> mu, sigma = 0, 0.1 # mean and standard deviation
>>> rng = np.random.default_rng()
>>> s = rng.normal(mu, sigma, 1000)

Verify the mean and the standard deviation:

>>> abs(mu - np.mean(s))
0.0  # may vary
>>> abs(sigma - np.std(s, ddof=1))
0.0  # may vary

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
...                np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
...          linewidth=2, color='r')
>>> plt.show()

Two-by-four array of samples from the normal distribution with mean 3 and standard deviation 2.5:

>>> rng = np.random.default_rng()
>>> rng.normal(3, 2.5, size=(2, 4))
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random
pareto(a, size=None)

Draw samples from a Pareto II (AKA Lomax) distribution with specified shape.

Parameters:
  • a (float or array_like of floats) – Shape of the distribution. Must be positive.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the Pareto II distribution.

Return type:

ndarray or scalar

See also

scipy.stats.pareto

Pareto I distribution

scipy.stats.lomax

Lomax (Pareto II) distribution

scipy.stats.genpareto

Generalized Pareto distribution

Notes

The probability density for the Pareto II distribution is

\[p(x) = \frac{a}{{x+1}^{a+1}} , x \ge 0\]

where \(a > 0\) is the shape.

The Pareto II distribution is a shifted and scaled version of the Pareto I distribution, which can be found in scipy.stats.pareto.

References

Examples

Draw samples from the distribution:

>>> a = 3.
>>> rng = np.random.default_rng()
>>> s = rng.pareto(a, 10000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> x = np.linspace(0, 3, 50)
>>> pdf = a / (x+1)**(a+1)
>>> plt.hist(s, bins=x, density=True, label='histogram')
>>> plt.plot(x, pdf, linewidth=2, color='r', label='pdf')
>>> plt.xlim(x.min(), x.max())
>>> plt.legend()
>>> plt.show()
permutation(x, axis=0)

Randomly permute a sequence, or return a permuted range.

Parameters:
  • x (int or array_like) – If x is an integer, randomly permute np.arange(x). If x is an array, make a copy and shuffle the elements randomly.

  • axis (int, optional) – The axis which x is shuffled along. Default is 0.

Returns:

out – Permuted sequence or array range.

Return type:

ndarray

Examples

>>> rng = np.random.default_rng()
>>> rng.permutation(10)
array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6]) # random
>>> rng.permutation([1, 4, 9, 12, 15])
array([15,  1,  9,  4, 12]) # random
>>> arr = np.arange(9).reshape((3, 3))
>>> rng.permutation(arr)
array([[6, 7, 8], # random
       [0, 1, 2],
       [3, 4, 5]])
>>> rng.permutation("abc")
Traceback (most recent call last):
    ...
numpy.exceptions.AxisError: axis 0 is out of bounds for array of dimension 0
>>> arr = np.arange(9).reshape((3, 3))
>>> rng.permutation(arr, axis=1)
array([[0, 2, 1], # random
       [3, 5, 4],
       [6, 8, 7]])
permuted(x, axis=None, out=None)

Randomly permute x along axis axis.

Unlike shuffle, each slice along the given axis is shuffled independently of the others.

Parameters:
  • x (array_like, at least one-dimensional) – Array to be shuffled.

  • axis (int, optional) – Slices of x in this axis are shuffled. Each slice is shuffled independently of the others. If axis is None, the flattened array is shuffled.

  • out (ndarray, optional) – If given, this is the destination of the shuffled array. If out is None, a shuffled copy of the array is returned.

Returns:

If out is None, a shuffled copy of x is returned. Otherwise, the shuffled array is stored in out, and out is returned

Return type:

ndarray

See also

shuffle, permutation

Notes

An important distinction between methods shuffle and permuted is how they both treat the axis parameter which can be found at generator-handling-axis-parameter.

Examples

Create a numpy.random.Generator instance:

>>> rng = np.random.default_rng()

Create a test array:

>>> x = np.arange(24).reshape(3, 8)
>>> x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

Shuffle the rows of x:

>>> y = rng.permuted(x, axis=1)
>>> y
array([[ 4,  3,  6,  7,  1,  2,  5,  0],  # random
       [15, 10, 14,  9, 12, 11,  8, 13],
       [17, 16, 20, 21, 18, 22, 23, 19]])

x has not been modified:

>>> x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

To shuffle the rows of x in-place, pass x as the out parameter:

>>> y = rng.permuted(x, axis=1, out=x)
>>> x
array([[ 3,  0,  4,  7,  1,  6,  2,  5],  # random
       [ 8, 14, 13,  9, 12, 11, 15, 10],
       [17, 18, 16, 22, 19, 23, 20, 21]])

Note that when the out parameter is given, the return value is out:

>>> y is x
True
poisson(lam=1.0, size=None)

Draw samples from a Poisson distribution.

The Poisson distribution is the limit of the binomial distribution for large N.

Parameters:
  • lam (float or array_like of floats) – Expected number of events occurring in a fixed-time interval, must be >= 0. A sequence must be broadcastable over the requested size.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if lam is a scalar. Otherwise, np.array(lam).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Poisson distribution.

Return type:

ndarray or scalar

Notes

The probability mass function (PMF) of Poisson distribution is

\[f(k; \lambda)=\frac{\lambda^k e^{-\lambda}}{k!}\]

For events with an expected separation \(\lambda\) the Poisson distribution \(f(k; \lambda)\) describes the probability of \(k\) events occurring within the observed interval \(\lambda\).

Because the output is limited to the range of the C int64 type, a ValueError is raised when lam is within 10 sigma of the maximum representable value.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> lam, size = 5, 10000
>>> s = rng.poisson(lam=lam, size=size)

Verify the mean and variance, which should be approximately lam:

>>> s.mean(), s.var()
(4.9917 5.1088311)  # may vary

Display the histogram and probability mass function:

>>> import matplotlib.pyplot as plt
>>> from scipy import stats
>>> x = np.arange(0, 21)
>>> pmf = stats.poisson.pmf(x, mu=lam)
>>> plt.hist(s, bins=x, density=True, width=0.5)
>>> plt.stem(x, pmf, 'C1-')
>>> plt.show()

Draw each 100 values for lambda 100 and 500:

>>> s = rng.poisson(lam=(100., 500.), size=(100, 2))
power(a, size=None)

Draws samples in [0, 1] from a power distribution with positive exponent a - 1.

Also known as the power function distribution.

Parameters:
  • a (float or array_like of floats) – Parameter of the distribution. Must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the parameterized power distribution.

Return type:

ndarray or scalar

Raises:

ValueError – If a <= 0.

Notes

The probability density function is

\[P(x; a) = ax^{a-1}, 0 \le x \le 1, a>0.\]

The power function distribution is just the inverse of the Pareto distribution. It may also be seen as a special case of the Beta distribution.

It is used, for example, in modeling the over-reporting of insurance claims.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> a = 5. # shape
>>> samples = 1000
>>> s = rng.power(a, samples)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, bins=30)
>>> x = np.linspace(0, 1, 100)
>>> y = a*x**(a-1.)
>>> normed_y = samples*np.diff(bins)[0]*y
>>> plt.plot(x, normed_y)
>>> plt.show()

Compare the power function distribution to the inverse of the Pareto.

>>> from scipy import stats
>>> rvs = rng.power(5, 1000000)
>>> rvsp = rng.pareto(5, 1000000)
>>> xx = np.linspace(0,1,100)
>>> powpdf = stats.powerlaw.pdf(xx,5)
>>> plt.figure()
>>> plt.hist(rvs, bins=50, density=True)
>>> plt.plot(xx,powpdf,'r-')
>>> plt.title('power(5)')
>>> plt.figure()
>>> plt.hist(1./(1.+rvsp), bins=50, density=True)
>>> plt.plot(xx,powpdf,'r-')
>>> plt.title('inverse of 1 + Generator.pareto(5)')
>>> plt.figure()
>>> plt.hist(1./(1.+rvsp), bins=50, density=True)
>>> plt.plot(xx,powpdf,'r-')
>>> plt.title('inverse of stats.pareto(5)')
random(size=None, dtype=np.float64, out=None)

Return random floats in the half-open interval [0.0, 1.0).

Results are from the “continuous uniform” distribution over the stated interval. To sample \(Unif[a, b), b > a\) use uniform or multiply the output of random by (b - a) and add a:

(b - a) * random() + a
Parameters:
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – Array of random floats of shape size (unless size=None, in which case a single float is returned).

Return type:

float or ndarray of floats

See also

uniform

Draw samples from the parameterized uniform distribution.

Examples

>>> rng = np.random.default_rng()
>>> rng.random()
0.47108547995356098 # random
>>> type(rng.random())
<class 'float'>
>>> rng.random((5,))
array([ 0.30220482,  0.86820401,  0.1654503 ,  0.11659149,  0.54323428]) # random

Three-by-two array of random numbers from [-5, 0):

>>> 5 * rng.random((3, 2)) - 5
array([[-3.99149989, -0.52338984], # random
       [-2.99091858, -0.79479508],
       [-1.23204345, -1.75224494]])
rayleigh(scale=1.0, size=None)

Draw samples from a Rayleigh distribution.

The \(\chi\) and Weibull distributions are generalizations of the Rayleigh.

Parameters:
  • scale (float or array_like of floats, optional) – Scale, also equals the mode. Must be non-negative. Default is 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if scale is a scalar. Otherwise, np.array(scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Rayleigh distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the Rayleigh distribution is

\[P(x;scale) = \frac{x}{scale^2}e^{\frac{-x^2}{2 \cdotp scale^2}}\]

The Rayleigh distribution would arise, for example, if the East and North components of the wind velocity had identical zero-mean Gaussian distributions. Then the wind speed would have a Rayleigh distribution.

References

Examples

Draw values from the distribution and plot the histogram

>>> from matplotlib.pyplot import hist
>>> rng = np.random.default_rng()
>>> values = hist(rng.rayleigh(3, 100000), bins=200, density=True)

Wave heights tend to follow a Rayleigh distribution. If the mean wave height is 1 meter, what fraction of waves are likely to be larger than 3 meters?

>>> meanvalue = 1
>>> modevalue = np.sqrt(2 / np.pi) * meanvalue
>>> s = rng.rayleigh(modevalue, 1000000)

The percentage of waves larger than 3 meters is:

>>> 100.*sum(s>3)/1000000.
0.087300000000000003 # random
shuffle(x, axis=0)

Modify an array or sequence in-place by shuffling its contents.

The order of sub-arrays is changed but their contents remains the same.

Parameters:
  • x (ndarray or MutableSequence) – The array, list or mutable sequence to be shuffled.

  • axis (int, optional) – The axis which x is shuffled along. Default is 0. It is only supported on ndarray objects.

Return type:

None

See also

permuted, permutation

Notes

An important distinction between methods shuffle and permuted is how they both treat the axis parameter which can be found at generator-handling-axis-parameter.

Examples

>>> rng = np.random.default_rng()
>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> rng.shuffle(arr)
>>> arr
array([2, 0, 7, 5, 1, 4, 8, 9, 3, 6]) # random
>>> arr = np.arange(9).reshape((3, 3))
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> rng.shuffle(arr)
>>> arr
array([[3, 4, 5], # random
       [6, 7, 8],
       [0, 1, 2]])
>>> arr = np.arange(9).reshape((3, 3))
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> rng.shuffle(arr, axis=1)
>>> arr
array([[2, 0, 1], # random
       [5, 3, 4],
       [8, 6, 7]])
spawn(n_children)

Create new independent child generators.

See seedsequence-spawn for additional notes on spawning children.

Added in version 1.25.0.

Parameters:

n_children (int)

Returns:

child_generators

Return type:

list of Generators

Raises:

TypeError – When the underlying SeedSequence does not implement spawning.

See also

random.BitGenerator.spawn, random.SeedSequence.spawn

bit_generator

The bit generator instance used by the generator.

Examples

Starting from a seeded default generator:

>>> # High quality entropy created with: f"0x{secrets.randbits(128):x}"
>>> entropy = 0x3034c61a9ae04ff8cb62ab8ec2c4b501
>>> rng = np.random.default_rng(entropy)

Create two new generators for example for parallel execution:

>>> child_rng1, child_rng2 = rng.spawn(2)

Drawn numbers from each are independent but derived from the initial seeding entropy:

>>> rng.uniform(), child_rng1.uniform(), child_rng2.uniform()
(0.19029263503854454, 0.9475673279178444, 0.4702687338396767)

It is safe to spawn additional children from the original rng or the children:

>>> more_child_rngs = rng.spawn(20)
>>> nested_spawn = child_rng1.spawn(20)
standard_cauchy(size=None)

Draw samples from a standard Cauchy distribution with mode = 0.

Also known as the Lorentz distribution.

Parameters:

size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

Returns:

samples – The drawn samples.

Return type:

ndarray or scalar

Notes

The probability density function for the full Cauchy distribution is

\[P(x; x_0, \gamma) = \frac{1}{\pi \gamma \bigl[ 1+ (\frac{x-x_0}{\gamma})^2 \bigr] }\]

and the Standard Cauchy distribution just sets \(x_0=0\) and \(\gamma=1\)

The Cauchy distribution arises in the solution to the driven harmonic oscillator problem, and also describes spectral line broadening. It also describes the distribution of values at which a line tilted at a random angle will cut the x axis.

When studying hypothesis tests that assume normality, seeing how the tests perform on data from a Cauchy distribution is a good indicator of their sensitivity to a heavy-tailed distribution, since the Cauchy looks very much like a Gaussian distribution, but with heavier tails.

References

Examples

Draw samples and plot the distribution:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> s = rng.standard_cauchy(1000000)
>>> s = s[(s>-25) & (s<25)]  # truncate distribution so it plots well
>>> plt.hist(s, bins=100)
>>> plt.show()
standard_exponential(size=None, dtype=np.float64, method='zig', out=None)

Draw samples from the standard exponential distribution.

standard_exponential is identical to the exponential distribution with a scale parameter of 1.

Parameters:
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • method (str, optional) – Either ‘inv’ or ‘zig’. ‘inv’ uses the default inverse CDF method. ‘zig’ uses the much faster Ziggurat method of Marsaglia and Tsang.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – Drawn samples.

Return type:

float or ndarray

Examples

Output a 3x8000 array:

>>> rng = np.random.default_rng()
>>> n = rng.standard_exponential((3, 8000))
standard_gamma(shape, size=None, dtype=np.float64, out=None)

Draw samples from a standard Gamma distribution.

Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated “k”) and scale=1.

Parameters:
  • shape (float or array_like of floats) – Parameter, must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if shape is a scalar. Otherwise, np.array(shape).size samples are drawn.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – Drawn samples from the parameterized standard gamma distribution.

Return type:

ndarray or scalar

See also

scipy.stats.gamma

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gamma distribution is

\[p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},\]

where \(k\) is the shape and \(\theta\) the scale, and \(\Gamma\) is the Gamma function.

The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

References

Examples

Draw samples from the distribution:

>>> shape, scale = 2., 1. # mean and width
>>> rng = np.random.default_rng()
>>> s = rng.standard_gamma(shape, 1000000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> import scipy.special as sps
>>> count, bins, _ = plt.hist(s, 50, density=True)
>>> y = bins**(shape-1) * ((np.exp(-bins/scale))/
...                       (sps.gamma(shape) * scale**shape))
>>> plt.plot(bins, y, linewidth=2, color='r')
>>> plt.show()
standard_normal(size=None, dtype=np.float64, out=None)

Draw samples from a standard Normal distribution (mean=0, stdev=1).

Parameters:
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – A floating-point array of shape size of drawn samples, or a single sample if size was not specified.

Return type:

float or ndarray

See also

normal

Equivalent function with additional loc and scale arguments for setting the mean and standard deviation.

Notes

For random samples from the normal distribution with mean mu and standard deviation sigma, use one of:

mu + sigma * rng.standard_normal(size=...)
rng.normal(mu, sigma, size=...)

Examples

>>> rng = np.random.default_rng()
>>> rng.standard_normal()
2.1923875335537315 # random
>>> s = rng.standard_normal(8000)
>>> s
array([ 0.6888893 ,  0.78096262, -0.89086505, ...,  0.49876311,  # random
       -0.38672696, -0.4685006 ])                                # random
>>> s.shape
(8000,)
>>> s = rng.standard_normal(size=(3, 4, 2))
>>> s.shape
(3, 4, 2)

Two-by-four array of samples from the normal distribution with mean 3 and standard deviation 2.5:

>>> 3 + 2.5 * rng.standard_normal(size=(2, 4))
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random
standard_t(df, size=None)

Draw samples from a standard Student’s t distribution with df degrees of freedom.

A special case of the hyperbolic distribution. As df gets large, the result resembles that of the standard normal distribution (standard_normal).

Parameters:
  • df (float or array_like of floats) – Degrees of freedom, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if df is a scalar. Otherwise, np.array(df).size samples are drawn.

Returns:

out – Drawn samples from the parameterized standard Student’s t distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the t distribution is

\[P(x, df) = \frac{\Gamma(\frac{df+1}{2})}{\sqrt{\pi df} \Gamma(\frac{df}{2})}\Bigl( 1+\frac{x^2}{df} \Bigr)^{-(df+1)/2}\]

The t test is based on an assumption that the data come from a Normal distribution. The t test provides a way to test whether the sample mean (that is the mean calculated from the data) is a good estimate of the true mean.

The derivation of the t-distribution was first published in 1908 by William Gosset while working for the Guinness Brewery in Dublin. Due to proprietary issues, he had to publish under a pseudonym, and so he used the name Student.

References

Examples

From Dalgaard page 83 [1]_, suppose the daily energy intake for 11 women in kilojoules (kJ) is:

>>> intake = np.array([5260., 5470, 5640, 6180, 6390, 6515, 6805, 7515, \
...                    7515, 8230, 8770])

Does their energy intake deviate systematically from the recommended value of 7725 kJ? Our null hypothesis will be the absence of deviation, and the alternate hypothesis will be the presence of an effect that could be either positive or negative, hence making our test 2-tailed.

Because we are estimating the mean and we have N=11 values in our sample, we have N-1=10 degrees of freedom. We set our significance level to 95% and compute the t statistic using the empirical mean and empirical standard deviation of our intake. We use a ddof of 1 to base the computation of our empirical standard deviation on an unbiased estimate of the variance (note: the final estimate is not unbiased due to the concave nature of the square root).

>>> np.mean(intake)
6753.636363636364
>>> intake.std(ddof=1)
1142.1232221373727
>>> t = (np.mean(intake)-7725)/(intake.std(ddof=1)/np.sqrt(len(intake)))
>>> t
-2.8207540608310198

We draw 1000000 samples from Student’s t distribution with the adequate degrees of freedom.

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> s = rng.standard_t(10, size=1000000)
>>> h = plt.hist(s, bins=100, density=True)

Does our t statistic land in one of the two critical regions found at both tails of the distribution?

>>> np.sum(np.abs(t) < np.abs(s)) / float(len(s))
0.018318  #random < 0.05, statistic is in critical region

The probability value for this 2-tailed test is about 1.83%, which is lower than the 5% pre-determined significance threshold.

Therefore, the probability of observing values as extreme as our intake conditionally on the null hypothesis being true is too low, and we reject the null hypothesis of no deviation.

triangular(left, mode, right, size=None)

Draw samples from the triangular distribution over the interval [left, right].

The triangular distribution is a continuous probability distribution with lower limit left, peak at mode, and upper limit right. Unlike the other distributions, these parameters directly define the shape of the pdf.

Parameters:
  • left (float or array_like of floats) – Lower limit.

  • mode (float or array_like of floats) – The value where the peak of the distribution occurs. The value must fulfill the condition left <= mode <= right.

  • right (float or array_like of floats) – Upper limit, must be larger than left.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if left, mode, and right are all scalars. Otherwise, np.broadcast(left, mode, right).size samples are drawn.

Returns:

out – Drawn samples from the parameterized triangular distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the triangular distribution is

\[\begin{split}P(x;l, m, r) = \begin{cases} \frac{2(x-l)}{(r-l)(m-l)}& \text{for $l \leq x \leq m$},\\ \frac{2(r-x)}{(r-l)(r-m)}& \text{for $m \leq x \leq r$},\\ 0& \text{otherwise}. \end{cases}\end{split}\]

The triangular distribution is often used in ill-defined problems where the underlying distribution is not known, but some knowledge of the limits and mode exists. Often it is used in simulations.

References

Examples

Draw values from the distribution and plot the histogram:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> h = plt.hist(rng.triangular(-3, 0, 8, 100000), bins=200,
...              density=True)
>>> plt.show()
uniform(low=0.0, high=1.0, size=None)

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

Parameters:
  • low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.

  • high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than high. The high limit may be included in the returned array of floats due to floating-point rounding in the equation low + (high-low) * random_sample(). high - low must be non-negative. The default value is 1.0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if low and high are both scalars. Otherwise, np.broadcast(low, high).size samples are drawn.

Returns:

out – Drawn samples from the parameterized uniform distribution.

Return type:

ndarray or scalar

See also

integers

Discrete uniform distribution, yielding integers.

random

Floats uniformly distributed over [0, 1).

Notes

The probability density function of the uniform distribution is

\[p(x) = \frac{1}{b - a}\]

anywhere within the interval [a, b), and zero elsewhere.

When high == low, values of low will be returned.

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> s = rng.uniform(-1,0,1000)

All values are within the given interval:

>>> np.all(s >= -1)
True
>>> np.all(s < 0)
True

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 15, density=True)
>>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
>>> plt.show()
vonmises(mu, kappa, size=None)

Draw samples from a von Mises distribution.

Samples are drawn from a von Mises distribution with specified mode (mu) and concentration (kappa), on the interval [-pi, pi].

The von Mises distribution (also known as the circular normal distribution) is a continuous probability distribution on the unit circle. It may be thought of as the circular analogue of the normal distribution.

Parameters:
  • mu (float or array_like of floats) – Mode (“center”) of the distribution.

  • kappa (float or array_like of floats) – Concentration of the distribution, has to be >=0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if mu and kappa are both scalars. Otherwise, np.broadcast(mu, kappa).size samples are drawn.

Returns:

out – Drawn samples from the parameterized von Mises distribution.

Return type:

ndarray or scalar

See also

scipy.stats.vonmises

probability density function, distribution, or cumulative density function, etc.

Notes

The probability density for the von Mises distribution is

\[p(x) = \frac{e^{\kappa cos(x-\mu)}}{2\pi I_0(\kappa)},\]

where \(\mu\) is the mode and \(\kappa\) the concentration, and \(I_0(\kappa)\) is the modified Bessel function of order 0.

The von Mises is named for Richard Edler von Mises, who was born in Austria-Hungary, in what is now the Ukraine. He fled to the United States in 1939 and became a professor at Harvard. He worked in probability theory, aerodynamics, fluid mechanics, and philosophy of science.

References

Examples

Draw samples from the distribution:

>>> mu, kappa = 0.0, 4.0 # mean and concentration
>>> rng = np.random.default_rng()
>>> s = rng.vonmises(mu, kappa, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> from scipy.special import i0
>>> plt.hist(s, 50, density=True)
>>> x = np.linspace(-np.pi, np.pi, num=51)
>>> y = np.exp(kappa*np.cos(x-mu))/(2*np.pi*i0(kappa))
>>> plt.plot(x, y, linewidth=2, color='r')
>>> plt.show()
wald(mean, scale, size=None)

Draw samples from a Wald, or inverse Gaussian, distribution.

As the scale approaches infinity, the distribution becomes more like a Gaussian. Some references claim that the Wald is an inverse Gaussian with mean equal to 1, but this is by no means universal.

The inverse Gaussian distribution was first studied in relationship to Brownian motion. In 1956 M.C.K. Tweedie used the name inverse Gaussian because there is an inverse relationship between the time to cover a unit distance and distance covered in unit time.

Parameters:
  • mean (float or array_like of floats) – Distribution mean, must be > 0.

  • scale (float or array_like of floats) – Scale parameter, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if mean and scale are both scalars. Otherwise, np.broadcast(mean, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Wald distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the Wald distribution is

\[P(x;mean,scale) = \sqrt{\frac{scale}{2\pi x^3}}e^ \frac{-scale(x-mean)^2}{2\cdotp mean^2x}\]

As noted above the inverse Gaussian distribution first arise from attempts to model Brownian motion. It is also a competitor to the Weibull for use in reliability modeling and modeling stock returns and interest rate processes.

References

Examples

Draw values from the distribution and plot the histogram:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> h = plt.hist(rng.wald(3, 2, 100000), bins=200, density=True)
>>> plt.show()
weibull(a, size=None)

Draw samples from a Weibull distribution.

Draw samples from a 1-parameter Weibull distribution with the given shape parameter a.

\[X = (-ln(U))^{1/a}\]

Here, U is drawn from the uniform distribution over (0,1].

The more common 2-parameter Weibull, including a scale parameter \(\lambda\) is just \(X = \lambda(-ln(U))^{1/a}\).

Parameters:
  • a (float or array_like of floats) – Shape parameter of the distribution. Must be nonnegative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Weibull distribution.

Return type:

ndarray or scalar

See also

scipy.stats.weibull_max, scipy.stats.weibull_min, scipy.stats.genextreme, gumbel

Notes

The Weibull (or Type III asymptotic extreme value distribution for smallest values, SEV Type III, or Rosin-Rammler distribution) is one of a class of Generalized Extreme Value (GEV) distributions used in modeling extreme value problems. This class includes the Gumbel and Frechet distributions.

The probability density for the Weibull distribution is

\[p(x) = \frac{a} {\lambda}(\frac{x}{\lambda})^{a-1}e^{-(x/\lambda)^a},\]

where \(a\) is the shape and \(\lambda\) the scale.

The function has its peak (the mode) at \(\lambda(\frac{a-1}{a})^{1/a}\).

When a = 1, the Weibull distribution reduces to the exponential distribution.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> a = 5. # shape
>>> s = rng.weibull(a, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> def weibull(x, n, a):
...     return (a / n) * (x / n)**(a - 1) * np.exp(-(x / n)**a)
>>> count, bins, _ = plt.hist(rng.weibull(5., 1000))
>>> x = np.linspace(0, 2, 1000)
>>> bin_spacing = np.mean(np.diff(bins))
>>> plt.plot(x, weibull(x, 1., 5.) * bin_spacing * s.size, label='Weibull PDF')
>>> plt.legend()
>>> plt.show()
zipf(a, size=None)

Draw samples from a Zipf distribution.

Samples are drawn from a Zipf distribution with specified parameter a > 1.

The Zipf distribution (also known as the zeta distribution) is a discrete probability distribution that satisfies Zipf’s law: the frequency of an item is inversely proportional to its rank in a frequency table.

Parameters:
  • a (float or array_like of floats) – Distribution parameter. Must be greater than 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Zipf distribution.

Return type:

ndarray or scalar

See also

scipy.stats.zipf

probability density function, distribution, or cumulative density function, etc.

Notes

The probability mass function (PMF) for the Zipf distribution is

\[p(k) = \frac{k^{-a}}{\zeta(a)},\]

for integers \(k \geq 1\), where \(\zeta\) is the Riemann Zeta function.

It is named for the American linguist George Kingsley Zipf, who noted that the frequency of any word in a sample of a language is inversely proportional to its rank in the frequency table.

References

Examples

Draw samples from the distribution:

>>> a = 4.0
>>> n = 20000
>>> rng = np.random.default_rng()
>>> s = rng.zipf(a, size=n)

Display the histogram of the samples, along with the expected histogram based on the probability density function:

>>> import matplotlib.pyplot as plt
>>> from scipy.special import zeta

bincount provides a fast histogram for small integers.

>>> count = np.bincount(s)
>>> k = np.arange(1, s.max() + 1)
>>> plt.bar(k, count[1:], alpha=0.5, label='sample count')
>>> plt.plot(k, n*(k**-a)/zeta(a), 'k.-', alpha=0.5,
...          label='expected count')
>>> plt.semilogy()
>>> plt.grid(alpha=0.4)
>>> plt.legend()
>>> plt.title(f'Zipf sample, a={a}, size={n}')
>>> plt.show()
class westpa.core.sim_manager.MT19937(seed=None)

Bases: BitGenerator

Container for the Mersenne Twister pseudo-random number generator.

Parameters:

seed ({None, int, array_like[ints], SeedSequence}, optional) – A seed to initialize the BitGenerator. If None, then fresh, unpredictable entropy will be pulled from the OS. If an int or array_like[ints] is passed, then it will be passed to SeedSequence to derive the initial BitGenerator state. One may also pass in a SeedSequence instance.

lock

Lock instance that is shared so that the same bit git generator can be used in multiple Generators without corrupting the state. Code that generates values from a bit generator should hold the bit generator’s lock.

Type:

threading.Lock

Notes

MT19937 provides a capsule containing function pointers that produce doubles, and unsigned 32 and 64- bit integers [1]_. These are not directly consumable in Python and must be consumed by a Generator or similar object that supports low-level access.

The Python stdlib module “random” also contains a Mersenne Twister pseudo-random number generator.

State and Seeding

The MT19937 state vector consists of a 624-element array of 32-bit unsigned integers plus a single integer value between 0 and 624 that indexes the current position within the main array.

The input seed is processed by SeedSequence to fill the whole state. The first element is reset such that only its most significant bit is set.

Parallel Features

The preferred way to use a BitGenerator in parallel applications is to use the SeedSequence.spawn method to obtain entropy values, and to use these to generate new BitGenerators:

>>> from numpy.random import Generator, MT19937, SeedSequence
>>> sg = SeedSequence(1234)
>>> rg = [Generator(MT19937(s)) for s in sg.spawn(10)]

Another method is to use MT19937.jumped which advances the state as-if \(2^{128}\) random numbers have been generated ([1]_, [2]_). This allows the original sequence to be split so that distinct segments can be used in each worker process. All generators should be chained to ensure that the segments come from the same sequence.

>>> from numpy.random import Generator, MT19937, SeedSequence
>>> sg = SeedSequence(1234)
>>> bit_generator = MT19937(sg)
>>> rg = []
>>> for _ in range(10):
...    rg.append(Generator(bit_generator))
...    # Chain the BitGenerators
...    bit_generator = bit_generator.jumped()

Compatibility Guarantee

MT19937 makes a guarantee that a fixed seed will always produce the same random integer stream.

References

jumped(jumps=1)

Returns a new bit generator with the state jumped

The state of the returned bit generator is jumped as-if 2**(128 * jumps) random numbers have been generated.

Parameters:

jumps (integer, positive) – Number of times to jump the state of the bit generator returned

Returns:

bit_generator – New instance of generator jumped iter times

Return type:

MT19937

Notes

The jump step is computed using a modified version of Matsumoto’s implementation of Horner’s method. The step polynomial is precomputed to perform 2**128 steps. The jumped state has been verified to match the state produced using Matsumoto’s original code.

References

state

Get or set the PRNG state

Returns:

state – Dictionary containing the information required to describe the state of the PRNG

Return type:

dict

westpa.core.sim_manager.weight_dtype

alias of float64

class westpa.core.sim_manager.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)

Bases: object

A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)

SEG_ENDPOINT_CONTINUES = 1
SEG_ENDPOINT_MERGED = 2
SEG_ENDPOINT_RECYCLED = 3
SEG_ENDPOINT_UNSET = 0
SEG_INITPOINT_CONTINUES = 1
SEG_INITPOINT_NEWTRAJ = 2
SEG_INITPOINT_UNSET = 0
SEG_STATUS_COMPLETE = 2
SEG_STATUS_FAILED = 3
SEG_STATUS_PREPARED = 1
SEG_STATUS_UNSET = 0
endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
property endpoint_type_text
endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
static final_pcoord(segment)

Return the final progress coordinate point of this segment.

static initial_pcoord(segment)

Return the initial progress coordinate point of this segment.

property initial_state_id
property initpoint_type
initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
property status_text
statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
class westpa.core.sim_manager.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)

Bases: object

Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • basis_state_id – Identifier of the basis state from which this state was generated, or None.

  • basis_state – The BasisState from which this state was generated, or None.

  • iter_created – Iteration in which this state was generated (0 for simulation initialization).

  • iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).

  • istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).

  • istate_status – Integer describing whether this initial state has been properly prepared.

  • pcoord – The representative progress coordinate of this state.

ISTATE_STATUS_FAILED = 2
ISTATE_STATUS_PENDING = 0
ISTATE_STATUS_PREPARED = 1
ISTATE_TYPE_BASIS = 1
ISTATE_TYPE_GENERATED = 2
ISTATE_TYPE_RESTART = 3
ISTATE_TYPE_START = 4
ISTATE_TYPE_UNSET = 0
ISTATE_UNUSED = 0
as_numpy_record()
istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
westpa.core.sim_manager.grouper(n, iterable, fillvalue=None)

Collect data into fixed-length chunks or blocks

exception westpa.core.sim_manager.PropagationError

Bases: RuntimeError

class westpa.core.sim_manager.WESimManager(rc=None)

Bases: object

process_config()
register_callback(hook, function, priority=0)

Registers a callback to execute during the given hook into the simulation loop. The optional priority is used to order when the function is called relative to other registered callbacks.

invoke_callbacks(hook, *args, **kwargs)
load_plugins(plugins=None)
report_bin_statistics(bins, target_states, save_summary=False)
get_bstate_pcoords(basis_states, label='basis')

For each of the given basis_states, calculate progress coordinate values as necessary. The HDF5 file is not updated. The BasisState objects are explicitly copied from the futures in order to retain auxdata/restart files (under BasisState.data) from certain work managers (e.g., the processes work manager.)

report_basis_states(basis_states, label='basis')
report_target_states(target_states)
initialize_simulation(basis_states, target_states, start_states, segs_per_state=1, suppress_we=False)

Initialize a new weighted ensemble simulation, taking segs_per_state initial states from each of the given basis_states.

w_init is the forward-facing version of this function

prepare_iteration()
finalize_iteration()

Clean up after an iteration and prepare for the next.

get_istate_futures()

Add n_states initial states to the internal list of initial states assigned to recycled particles. Spare states are used if available, otherwise new states are created. If created new initial states requires generation, then a set of futures is returned representing work manager tasks corresponding to the necessary generation work.

propagate()
save_bin_data()

Calculate and write flux and transition count matrices to HDF5. Population and rate matrices are likely useless at the single-tau level and are no longer written.

check_propagation()

Check for failures in propagation or initial state generation, and raise an exception if any are found.

run_we()

Run the weighted ensemble algorithm based on the binning in self.final_bins and the recycled particles in self.to_recycle, creating and committing the next iteration’s segments to storage as well.

prepare_new_iteration()

Commit data for the coming iteration to the HDF5 file.

run()
prepare_run()

Prepare a new run.

finalize_run()

Perform cleanup at the normal end of a run

pre_propagation()
post_propagation()
pre_we()
post_we()

westpa.core.states module

class westpa.core.states.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)

Bases: object

A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)

SEG_ENDPOINT_CONTINUES = 1
SEG_ENDPOINT_MERGED = 2
SEG_ENDPOINT_RECYCLED = 3
SEG_ENDPOINT_UNSET = 0
SEG_INITPOINT_CONTINUES = 1
SEG_INITPOINT_NEWTRAJ = 2
SEG_INITPOINT_UNSET = 0
SEG_STATUS_COMPLETE = 2
SEG_STATUS_FAILED = 3
SEG_STATUS_PREPARED = 1
SEG_STATUS_UNSET = 0
endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
property endpoint_type_text
endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
static final_pcoord(segment)

Return the final progress coordinate point of this segment.

static initial_pcoord(segment)

Return the initial progress coordinate point of this segment.

property initial_state_id
property initpoint_type
initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
property status_text
statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
class westpa.core.states.BasisState(label, probability, pcoord=None, auxref=None, state_id=None)

Bases: object

Describes an basis (micro)state. These basis states are used to generate initial states for new trajectories, either at the beginning of the simulation (i.e. at w_init) or due to recycling.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • label – A descriptive label for this microstate (may be empty)

  • probability – Probability of this state to be selected when creating a new trajectory.

  • pcoord – The representative progress coordinate of this state.

  • auxref – A user-provided (string) reference for locating data associated with this state (usually a filesystem path).

classmethod states_to_file(states, fileobj)

Write a file defining basis states, which may then be read by states_from_file().

classmethod states_from_file(statefile)

Read a file defining basis states. Each line defines a state, and contains a label, the probability, and optionally a data reference, separated by whitespace, as in:

unbound    1.0

or:

unbound_0    0.6        state0.pdb
unbound_1    0.4        state1.pdb
as_numpy_record()

Return the data for this state as a numpy record array.

class westpa.core.states.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)

Bases: object

Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • basis_state_id – Identifier of the basis state from which this state was generated, or None.

  • basis_state – The BasisState from which this state was generated, or None.

  • iter_created – Iteration in which this state was generated (0 for simulation initialization).

  • iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).

  • istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).

  • istate_status – Integer describing whether this initial state has been properly prepared.

  • pcoord – The representative progress coordinate of this state.

ISTATE_TYPE_UNSET = 0
ISTATE_TYPE_BASIS = 1
ISTATE_TYPE_GENERATED = 2
ISTATE_TYPE_RESTART = 3
ISTATE_TYPE_START = 4
ISTATE_UNUSED = 0
ISTATE_STATUS_PENDING = 0
ISTATE_STATUS_PREPARED = 1
ISTATE_STATUS_FAILED = 2
istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
as_numpy_record()
class westpa.core.states.TargetState(label, pcoord, state_id=None)

Bases: object

Describes a target state.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • label – A descriptive label for this microstate (may be empty)

  • pcoord – The representative progress coordinate of this state.

classmethod states_to_file(states, fileobj)

Write a file defining basis states, which may then be read by states_from_file().

classmethod states_from_file(statefile, dtype)

Read a file defining target states. Each line defines a state, and contains a label followed by a representative progress coordinate value, separated by whitespace, as in:

bound     0.02

for a single target and one-dimensional progress coordinates or:

bound    2.7    0.0
drift    100    50.0

for two targets and a two-dimensional progress coordinate.

westpa.core.states.pare_basis_initial_states(basis_states, initial_states, segments=None)

Given iterables of basis and initial states (and optionally segments that use them), return minimal sets (as in __builtins__.set) of states needed to describe the history of the given segments an initial states.

westpa.core.states.return_state_type(state_obj)

Convinience function for returning the state ID and type of the state_obj pointer

westpa.core.systems module

class westpa.core.systems.NopMapper

Bases: BinMapper

Put everything into one bin.

assign(coords, mask=None, output=None)
class westpa.core.systems.WESTSystem(rc=None)

Bases: object

A description of the system being simulated, including the dimensionality and data type of the progress coordinate, the number of progress coordinate entries expected from each segment, and binning. To construct a simulation, the user must subclass WESTSystem and set several instance variables.

At a minimum, the user must subclass WESTSystem and override :method:`initialize` to set the data type and dimensionality of progress coordinate data and define a bin mapper.

Variables:
  • pcoord_ndim – The number of dimensions in the progress coordinate. Defaults to 1 (i.e. a one-dimensional progress coordinate).

  • pcoord_dtype – The data type of the progress coordinate, which must be callable (e.g. np.float32 and long will work, but '<f4' and '<i8' will not). Defaults to np.float64.

  • pcoord_len – The length of the progress coordinate time series generated by each segment, including both the initial and final values. Defaults to 2 (i.e. only the initial and final progress coordinate values for a segment are returned from propagation).

  • bin_mapper – A bin mapper describing the progress coordinate space.

  • bin_target_counts – A vector of target counts, one per bin.

property bin_target_counts
initialize()

Prepare this system object for use in simulation or analysis, creating a bin space, setting replicas per bin, and so on. This function is called whenever a WEST tool creates an instance of the system driver.

prepare_run()

Prepare this system for use in a simulation run. Called by w_run in all worker processes.

finalize_run()

A hook for system-specific processing for the end of a simulation run (as defined by such things as maximum wallclock time, rather than perhaps more scientifically-significant definitions of “the end of a simulation run”)

new_pcoord_array(pcoord_len=None)

Return an appropriately-sized and -typed pcoord array for a timepoint, segment, or number of segments. If pcoord_len is not specified (or None), then a length appropriate for a segment is returned.

new_region_set()

westpa.core.textio module

Miscellaneous routines to help with input and output of WEST-related data in text format

class westpa.core.textio.NumericTextOutputFormatter(output_file, mode='wt', emit_header=None)

Bases: object

comment_string = '# '
emit_header = True
close()
write(str)
writelines(sequence)
write_comment(line)

Writes a line beginning with the comment string

write_header(line)

Appends a line to those written when the file header is written. The appropriate comment string will be prepended, so line should not include a comment character.

westpa.core.we_driver module

class westpa.core.we_driver.Generator(bit_generator)

Bases: object

Container for the BitGenerators.

Generator exposes a number of methods for generating random numbers drawn from a variety of probability distributions. In addition to the distribution-specific arguments, each method takes a keyword argument size that defaults to None. If size is None, then a single value is generated and returned. If size is an integer, then a 1-D array filled with generated values is returned. If size is a tuple, then an array with that shape is filled and returned.

The function numpy.random.default_rng() will instantiate a Generator with numpy’s default BitGenerator.

No Compatibility Guarantee

Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.

Parameters:

bit_generator (BitGenerator) – BitGenerator to use as the core generator.

Notes

The Python stdlib module :external+python:mod:`random` contains pseudo-random number generator with a number of methods that are similar to the ones available in Generator. It uses Mersenne Twister, and this bit generator can be accessed using MT19937. Generator, besides being NumPy-aware, has the advantage that it provides a much larger number of probability distributions to choose from.

Examples

>>> from numpy.random import Generator, PCG64
>>> rng = Generator(PCG64())
>>> rng.standard_normal()
-0.203  # random

See also

default_rng

Recommended constructor for Generator.

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalization, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

Parameters:
  • a (float or array_like of floats) – Alpha, positive (>0).

  • b (float or array_like of floats) – Beta, positive (>0).

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

Returns:

out – Drawn samples from the parameterized beta distribution.

Return type:

ndarray or scalar

Examples

The beta distribution has mean a/(a+b). If a == b and both are > 1, the distribution is symmetric with mean 0.5.

>>> rng = np.random.default_rng()
>>> a, b, size = 2.0, 2.0, 10000
>>> sample = rng.beta(a=a, b=b, size=size)
>>> np.mean(sample)
0.5047328775385895  # may vary

Otherwise the distribution is skewed left or right according to whether a or b is greater. The distribution is mirror symmetric. See for example:

>>> a, b, size = 2, 7, 10000
>>> sample_left = rng.beta(a=a, b=b, size=size)
>>> sample_right = rng.beta(a=b, b=a, size=size)
>>> m_left, m_right = np.mean(sample_left), np.mean(sample_right)
>>> print(m_left, m_right)
0.2238596793678923 0.7774613834041182  # may vary
>>> print(m_left - a/(a+b))
0.001637457145670096  # may vary
>>> print(m_right - b/(a+b))
-0.0003163943736596009  # may vary

Display the histogram of the two samples:

>>> import matplotlib.pyplot as plt
>>> plt.hist([sample_left, sample_right],
...          50, density=True, histtype='bar')
>>> plt.show()

References

binomial(n, p, size=None)

Draw samples from a binomial distribution.

Samples are drawn from a binomial distribution with specified parameters, n trials and p probability of success where n an integer >= 0 and p is in the interval [0,1]. (n may be input as a float, but it is truncated to an integer in use)

Parameters:
  • n (int or array_like of ints) – Parameter of the distribution, >= 0. Floats are also accepted, but they will be truncated to integers.

  • p (float or array_like of floats) – Parameter of the distribution, >= 0 and <=1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if n and p are both scalars. Otherwise, np.broadcast(n, p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized binomial distribution, where each sample is equal to the number of successes over the n trials.

Return type:

ndarray or scalar

See also

scipy.stats.binom

probability density function, distribution or cumulative density function, etc.

Notes

The probability mass function (PMF) for the binomial distribution is

\[P(N) = \binom{n}{N}p^N(1-p)^{n-N},\]

where \(n\) is the number of trials, \(p\) is the probability of success, and \(N\) is the number of successes.

When estimating the standard error of a proportion in a population by using a random sample, the normal distribution works well unless the product p*n <=5, where p = population proportion estimate, and n = number of samples, in which case the binomial distribution is used instead. For example, a sample of 15 people shows 4 who are left handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27*15 = 4, so the binomial distribution should be used in this case.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> n, p, size = 10, .5, 10000
>>> s = rng.binomial(n, p, 10000)

Assume a company drills 9 wild-cat oil exploration wells, each with an estimated probability of success of p=0.1. All nine wells fail. What is the probability of that happening?

Over size = 20,000 trials the probability of this happening is on average:

>>> n, p, size = 9, 0.1, 20000
>>> np.sum(rng.binomial(n=n, p=p, size=size) == 0)/size
0.39015  # may vary

The following can be used to visualize a sample with n=100, p=0.4 and the corresponding probability density function:

>>> import matplotlib.pyplot as plt
>>> from scipy.stats import binom
>>> n, p, size = 100, 0.4, 10000
>>> sample = rng.binomial(n, p, size=size)
>>> count, bins, _ = plt.hist(sample, 30, density=True)
>>> x = np.arange(n)
>>> y = binom.pmf(x, n, p)
>>> plt.plot(x, y, linewidth=2, color='r')
bit_generator

Gets the bit generator instance used by the generator

Returns:

bit_generator – The bit generator instance used by the generator

Return type:

BitGenerator

bytes(length)

Return random bytes.

Parameters:

length (int) – Number of random bytes.

Returns:

out – String of length length.

Return type:

bytes

Notes

This function generates random bytes from a discrete uniform distribution. The generated bytes are independent from the CPU’s native endianness.

Examples

>>> rng = np.random.default_rng()
>>> rng.bytes(10)
b'\xfeC\x9b\x86\x17\xf2\xa1\xafcp'  # random
chisquare(df, size=None)

Draw samples from a chi-square distribution.

When df independent random variables, each with standard normal distributions (mean 0, variance 1), are squared and summed, the resulting distribution is chi-square (see Notes). This distribution is often used in hypothesis testing.

Parameters:
  • df (float or array_like of floats) – Number of degrees of freedom, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if df is a scalar. Otherwise, np.array(df).size samples are drawn.

Returns:

out – Drawn samples from the parameterized chi-square distribution.

Return type:

ndarray or scalar

Raises:

ValueError – When df <= 0 or when an inappropriate size (e.g. size=-1) is given.

Notes

The variable obtained by summing the squares of df independent, standard normally distributed random variables:

\[Q = \sum_{i=1}^{\mathtt{df}} X^2_i\]

is chi-square distributed, denoted

\[Q \sim \chi^2_k.\]

The probability density function of the chi-squared distribution is

\[p(x) = \frac{(1/2)^{k/2}}{\Gamma(k/2)} x^{k/2 - 1} e^{-x/2},\]

where \(\Gamma\) is the gamma function,

\[\Gamma(x) = \int_0^{-\infty} t^{x - 1} e^{-t} dt.\]

References

Examples

>>> rng = np.random.default_rng()
>>> rng.chisquare(2,4)
array([ 1.89920014,  9.00867716,  3.13710533,  5.62318272]) # random

The distribution of a chi-square random variable with 20 degrees of freedom looks as follows:

>>> import matplotlib.pyplot as plt
>>> import scipy.stats as stats
>>> s = rng.chisquare(20, 10000)
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> x = np.linspace(0, 60, 1000)
>>> plt.plot(x, stats.chi2.pdf(x, df=20))
>>> plt.xlim([0, 60])
>>> plt.show()
choice(a, size=None, replace=True, p=None, axis=0, shuffle=True)

Generates a random sample from a given array

Parameters:
  • a ({array_like, int}) – If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated from np.arange(a).

  • size ({int, tuple[int]}, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn from the 1-d a. If a has more than one dimension, the size shape will be inserted into the axis dimension, so the output ndim will be a.ndim - 1 + len(size). Default is None, in which case a single value is returned.

  • replace (bool, optional) – Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.

  • p (1-D array_like, optional) – The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

  • axis (int, optional) – The axis along which the selection is performed. The default, 0, selects by row.

  • shuffle (bool, optional) – Whether the sample is shuffled when sampling without replacement. Default is True, False provides a speedup.

Returns:

samples – The generated random samples

Return type:

single item or ndarray

Raises:

ValueError – If a is an int and less than zero, if p is not 1-dimensional, if a is array-like with a size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size.

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

p must sum to 1 when cast to float64. To ensure this, you may wish to normalize using p = p / np.sum(p, dtype=float).

When passing a as an integer type and size is not specified, the return type is a native Python int.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> rng = np.random.default_rng()
>>> rng.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to rng.integers(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> rng.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> rng.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to rng.permutation(np.arange(5))[:3]

Generate a uniform random sample from a 2-D array along the first axis (the default), without replacement:

>>> rng.choice([[0, 1, 2], [3, 4, 5], [6, 7, 8]], 2, replace=False)
array([[3, 4, 5], # random
       [0, 1, 2]])

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> rng.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> rng.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
dirichlet(alpha, size=None)

Draw samples from the Dirichlet distribution.

Draw size samples of dimension k from a Dirichlet distribution. A Dirichlet-distributed random variable can be seen as a multivariate generalization of a Beta distribution. The Dirichlet distribution is a conjugate prior of a multinomial distribution in Bayesian inference.

Parameters:
  • alpha (sequence of floats, length k) – Parameter of the distribution (length k for sample of length k).

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n), then m * n * k samples are drawn. Default is None, in which case a vector of length k is returned.

Returns:

samples – The drawn samples, of shape (size, k).

Return type:

ndarray,

Raises:

ValueError – If any value in alpha is less than zero

Notes

The Dirichlet distribution is a distribution over vectors \(x\) that fulfil the conditions \(x_i>0\) and \(\sum_{i=1}^k x_i = 1\).

The probability density function \(p\) of a Dirichlet-distributed random vector \(X\) is proportional to

\[p(x) \propto \prod_{i=1}^{k}{x^{\alpha_i-1}_i},\]

where \(\alpha\) is a vector containing the positive concentration parameters.

The method uses the following property for computation: let \(Y\) be a random vector which has components that follow a standard gamma distribution, then \(X = \frac{1}{\sum_{i=1}^k{Y_i}} Y\) is Dirichlet-distributed

References

Examples

Taking an example cited in Wikipedia, this distribution can be used if one wanted to cut strings (each of initial length 1.0) into K pieces with different lengths, where each piece had, on average, a designated average length, but allowing some variation in the relative sizes of the pieces.

>>> rng = np.random.default_rng()
>>> s = rng.dirichlet((10, 5, 3), 20).transpose()
>>> import matplotlib.pyplot as plt
>>> plt.barh(range(20), s[0])
>>> plt.barh(range(20), s[1], left=s[0], color='g')
>>> plt.barh(range(20), s[2], left=s[0]+s[1], color='r')
>>> plt.title("Lengths of Strings")
exponential(scale=1.0, size=None)

Draw samples from an exponential distribution.

Its probability density function is

\[f(x; \frac{1}{\beta}) = \frac{1}{\beta} \exp(-\frac{x}{\beta}),\]

for x > 0 and 0 elsewhere. \(\beta\) is the scale parameter, which is the inverse of the rate parameter \(\lambda = 1/\beta\). The rate parameter is an alternative, widely used parameterization of the exponential distribution [3]_.

The exponential distribution is a continuous analogue of the geometric distribution. It describes many common situations, such as the size of raindrops measured over many rainstorms [1]_, or the time between page requests to Wikipedia [2]_.

Parameters:
  • scale (float or array_like of floats) – The scale parameter, \(\beta = 1/\lambda\). Must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if scale is a scalar. Otherwise, np.array(scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized exponential distribution.

Return type:

ndarray or scalar

Examples

Assume a company has 10000 customer support agents and the time between customer calls is exponentially distributed and that the average time between customer calls is 4 minutes.

>>> scale, size = 4, 10000
>>> rng = np.random.default_rng()
>>> time_between_calls = rng.exponential(scale=scale, size=size)

What is the probability that a customer will call in the next 4 to 5 minutes?

>>> x = ((time_between_calls < 5).sum())/size
>>> y = ((time_between_calls < 4).sum())/size
>>> x - y
0.08  # may vary

The corresponding distribution can be visualized as follows:

>>> import matplotlib.pyplot as plt
>>> scale, size = 4, 10000
>>> rng = np.random.default_rng()
>>> sample = rng.exponential(scale=scale, size=size)
>>> count, bins, _ = plt.hist(sample, 30, density=True)
>>> plt.plot(bins, scale**(-1)*np.exp(-scale**-1*bins), linewidth=2, color='r')
>>> plt.show()

References

f(dfnum, dfden, size=None)

Draw samples from an F distribution.

Samples are drawn from an F distribution with specified parameters, dfnum (degrees of freedom in numerator) and dfden (degrees of freedom in denominator), where both parameters must be greater than zero.

The random variate of the F distribution (also known as the Fisher distribution) is a continuous probability distribution that arises in ANOVA tests, and is the ratio of two chi-square variates.

Parameters:
  • dfnum (float or array_like of floats) – Degrees of freedom in numerator, must be > 0.

  • dfden (float or array_like of float) – Degrees of freedom in denominator, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if dfnum and dfden are both scalars. Otherwise, np.broadcast(dfnum, dfden).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Fisher distribution.

Return type:

ndarray or scalar

See also

scipy.stats.f

probability density function, distribution or cumulative density function, etc.

Notes

The F statistic is used to compare in-group variances to between-group variances. Calculating the distribution depends on the sampling, and so it is a function of the respective degrees of freedom in the problem. The variable dfnum is the number of samples minus one, the between-groups degrees of freedom, while dfden is the within-groups degrees of freedom, the sum of the number of samples in each group minus the number of groups.

References

Examples

An example from Glantz[1], pp 47-40:

Two groups, children of diabetics (25 people) and children from people without diabetes (25 controls). Fasting blood glucose was measured, case group had a mean value of 86.1, controls had a mean value of 82.2. Standard deviations were 2.09 and 2.49 respectively. Are these data consistent with the null hypothesis that the parents diabetic status does not affect their children’s blood glucose levels? Calculating the F statistic from the data gives a value of 36.01.

Draw samples from the distribution:

>>> dfnum = 1. # between group degrees of freedom
>>> dfden = 48. # within groups degrees of freedom
>>> rng = np.random.default_rng()
>>> s = rng.f(dfnum, dfden, 1000)

The lower bound for the top 1% of the samples is :

>>> np.sort(s)[-10]
7.61988120985 # random

So there is about a 1% chance that the F statistic will exceed 7.62, the measured value is 36, so the null hypothesis is rejected at the 1% level.

The corresponding probability density function for n = 20 and m = 20 is:

>>> import matplotlib.pyplot as plt
>>> from scipy import stats
>>> dfnum, dfden, size = 20, 20, 10000
>>> s = rng.f(dfnum=dfnum, dfden=dfden, size=size)
>>> bins, density, _ = plt.hist(s, 30, density=True)
>>> x = np.linspace(0, 5, 1000)
>>> plt.plot(x, stats.f.pdf(x, dfnum, dfden))
>>> plt.xlim([0, 5])
>>> plt.show()
gamma(shape, scale=1.0, size=None)

Draw samples from a Gamma distribution.

Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated “k”) and scale (sometimes designated “theta”), where both parameters are > 0.

Parameters:
  • shape (float or array_like of floats) – The shape of the gamma distribution. Must be non-negative.

  • scale (float or array_like of floats, optional) – The scale of the gamma distribution. Must be non-negative. Default is equal to 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if shape and scale are both scalars. Otherwise, np.broadcast(shape, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized gamma distribution.

Return type:

ndarray or scalar

See also

scipy.stats.gamma

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gamma distribution is

\[p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},\]

where \(k\) is the shape and \(\theta\) the scale, and \(\Gamma\) is the Gamma function.

The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

References

Examples

Draw samples from the distribution:

>>> shape, scale = 2., 2.  # mean=4, std=2*sqrt(2)
>>> rng = np.random.default_rng()
>>> s = rng.gamma(shape, scale, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> import scipy.special as sps
>>> count, bins, _ = plt.hist(s, 50, density=True)
>>> y = bins**(shape-1)*(np.exp(-bins/scale) /
...                      (sps.gamma(shape)*scale**shape))
>>> plt.plot(bins, y, linewidth=2, color='r')
>>> plt.show()
geometric(p, size=None)

Draw samples from the geometric distribution.

Bernoulli trials are experiments with one of two outcomes: success or failure (an example of such an experiment is flipping a coin). The geometric distribution models the number of trials that must be run in order to achieve success. It is therefore supported on the positive integers, k = 1, 2, ....

The probability mass function of the geometric distribution is

\[f(k) = (1 - p)^{k - 1} p\]

where p is the probability of success of an individual trial.

Parameters:
  • p (float or array_like of floats) – The probability of success of an individual trial.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if p is a scalar. Otherwise, np.array(p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized geometric distribution.

Return type:

ndarray or scalar

References

Examples

Draw 10,000 values from the geometric distribution, with the probability of an individual success equal to p = 0.35:

>>> p, size = 0.35, 10000
>>> rng = np.random.default_rng()
>>> sample = rng.geometric(p=p, size=size)

What proportion of trials succeeded after a single run?

>>> (sample == 1).sum()/size
0.34889999999999999  # may vary

The geometric distribution with p=0.35 looks as follows:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(sample, bins=30, density=True)
>>> plt.plot(bins, (1-p)**(bins-1)*p)
>>> plt.xlim([0, 25])
>>> plt.show()
gumbel(loc=0.0, scale=1.0, size=None)

Draw samples from a Gumbel distribution.

Draw samples from a Gumbel distribution with specified location and scale. For more information on the Gumbel distribution, see Notes and References below.

Parameters:
  • loc (float or array_like of floats, optional) – The location of the mode of the distribution. Default is 0.

  • scale (float or array_like of floats, optional) – The scale parameter of the distribution. Default is 1. Must be non- negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Gumbel distribution.

Return type:

ndarray or scalar

See also

scipy.stats.gumbel_l, scipy.stats.gumbel_r, scipy.stats.genextreme, weibull

Notes

The Gumbel (or Smallest Extreme Value (SEV) or the Smallest Extreme Value Type I) distribution is one of a class of Generalized Extreme Value (GEV) distributions used in modeling extreme value problems. The Gumbel is a special case of the Extreme Value Type I distribution for maximums from distributions with “exponential-like” tails.

The probability density for the Gumbel distribution is

\[p(x) = \frac{e^{-(x - \mu)/ \beta}}{\beta} e^{ -e^{-(x - \mu)/ \beta}},\]

where \(\mu\) is the mode, a location parameter, and \(\beta\) is the scale parameter.

The Gumbel (named for German mathematician Emil Julius Gumbel) was used very early in the hydrology literature, for modeling the occurrence of flood events. It is also used for modeling maximum wind speed and rainfall rates. It is a “fat-tailed” distribution - the probability of an event in the tail of the distribution is larger than if one used a Gaussian, hence the surprisingly frequent occurrence of 100-year floods. Floods were initially modeled as a Gaussian process, which underestimated the frequency of extreme events.

It is one of a class of extreme value distributions, the Generalized Extreme Value (GEV) distributions, which also includes the Weibull and Frechet.

The function has a mean of \(\mu + 0.57721\beta\) and a variance of \(\frac{\pi^2}{6}\beta^2\).

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> mu, beta = 0, 0.1 # location and scale
>>> s = rng.gumbel(mu, beta, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> plt.plot(bins, (1/beta)*np.exp(-(bins - mu)/beta)
...          * np.exp( -np.exp( -(bins - mu) /beta) ),
...          linewidth=2, color='r')
>>> plt.show()

Show how an extreme value distribution can arise from a Gaussian process and compare to a Gaussian:

>>> means = []
>>> maxima = []
>>> for i in range(0,1000) :
...    a = rng.normal(mu, beta, 1000)
...    means.append(a.mean())
...    maxima.append(a.max())
>>> count, bins, _ = plt.hist(maxima, 30, density=True)
>>> beta = np.std(maxima) * np.sqrt(6) / np.pi
>>> mu = np.mean(maxima) - 0.57721*beta
>>> plt.plot(bins, (1/beta)*np.exp(-(bins - mu)/beta)
...          * np.exp(-np.exp(-(bins - mu)/beta)),
...          linewidth=2, color='r')
>>> plt.plot(bins, 1/(beta * np.sqrt(2 * np.pi))
...          * np.exp(-(bins - mu)**2 / (2 * beta**2)),
...          linewidth=2, color='g')
>>> plt.show()
hypergeometric(ngood, nbad, nsample, size=None)

Draw samples from a Hypergeometric distribution.

Samples are drawn from a hypergeometric distribution with specified parameters, ngood (ways to make a good selection), nbad (ways to make a bad selection), and nsample (number of items sampled, which is less than or equal to the sum ngood + nbad).

Parameters:
  • ngood (int or array_like of ints) – Number of ways to make a good selection. Must be nonnegative and less than 10**9.

  • nbad (int or array_like of ints) – Number of ways to make a bad selection. Must be nonnegative and less than 10**9.

  • nsample (int or array_like of ints) – Number of items sampled. Must be nonnegative and less than ngood + nbad.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if ngood, nbad, and nsample are all scalars. Otherwise, np.broadcast(ngood, nbad, nsample).size samples are drawn.

Returns:

out – Drawn samples from the parameterized hypergeometric distribution. Each sample is the number of good items within a randomly selected subset of size nsample taken from a set of ngood good items and nbad bad items.

Return type:

ndarray or scalar

See also

multivariate_hypergeometric

Draw samples from the multivariate hypergeometric distribution.

scipy.stats.hypergeom

probability density function, distribution or cumulative density function, etc.

Notes

The probability mass function (PMF) for the Hypergeometric distribution is

\[P(x) = \frac{\binom{g}{x}\binom{b}{n-x}}{\binom{g+b}{n}},\]

where \(0 \le x \le n\) and \(n-b \le x \le g\)

for P(x) the probability of x good results in the drawn sample, g = ngood, b = nbad, and n = nsample.

Consider an urn with black and white marbles in it, ngood of them are black and nbad are white. If you draw nsample balls without replacement, then the hypergeometric distribution describes the distribution of black balls in the drawn sample.

Note that this distribution is very similar to the binomial distribution, except that in this case, samples are drawn without replacement, whereas in the Binomial case samples are drawn with replacement (or the sample space is infinite). As the sample space becomes large, this distribution approaches the binomial.

The arguments ngood and nbad each must be less than 10**9. For extremely large arguments, the algorithm that is used to compute the samples [4]_ breaks down because of loss of precision in floating point calculations. For such large values, if nsample is not also large, the distribution can be approximated with the binomial distribution, binomial(n=nsample, p=ngood/(ngood + nbad)).

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> ngood, nbad, nsamp = 100, 2, 10
# number of good, number of bad, and number of samples
>>> s = rng.hypergeometric(ngood, nbad, nsamp, 1000)
>>> from matplotlib.pyplot import hist
>>> hist(s)
#   note that it is very unlikely to grab both bad items

Suppose you have an urn with 15 white and 15 black marbles. If you pull 15 marbles at random, how likely is it that 12 or more of them are one color?

>>> s = rng.hypergeometric(15, 15, 15, 100000)
>>> sum(s>=12)/100000. + sum(s<=3)/100000.
#   answer = 0.003 ... pretty unlikely!
integers(low, high=None, size=None, dtype=np.int64, endpoint=False)

Return random integers from low (inclusive) to high (exclusive), or if endpoint=True, low (inclusive) to high (inclusive). Replaces RandomState.randint (with endpoint=False) and RandomState.random_integers (with endpoint=True)

Return random integers from the “discrete uniform” distribution of the specified dtype. If high is None (the default), then results are from 0 to low.

Parameters:
  • low (int or array-like of ints) – Lowest (signed) integers to be drawn from the distribution (unless high=None, in which case this parameter is 0 and this value is used for high).

  • high (int or array-like of ints, optional) – If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None). If array-like, must contain integer values

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result. Byteorder must be native. The default value is np.int64.

  • endpoint (bool, optional) – If true, sample from the interval [low, high] instead of the default [low, high) Defaults to False

Returns:

outsize-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.

Return type:

int or ndarray of ints

Notes

When using broadcasting with uint64 dtypes, the maximum value (2**64) cannot be represented as a standard integer type. The high array (or low if high is None) must have object dtype, e.g., array([2**64]).

Examples

>>> rng = np.random.default_rng()
>>> rng.integers(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])  # random
>>> rng.integers(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Generate a 2 x 4 array of ints between 0 and 4, inclusive:

>>> rng.integers(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])  # random

Generate a 1 x 3 array with 3 different upper bounds

>>> rng.integers(1, [3, 5, 10])
array([2, 2, 9])  # random

Generate a 1 by 3 array with 3 different lower bounds

>>> rng.integers([1, 5, 7], 10)
array([9, 8, 7])  # random

Generate a 2 by 4 array using broadcasting with dtype of uint8

>>> rng.integers([1, 3, 5, 7], [[10], [20]], dtype=np.uint8)
array([[ 8,  6,  9,  7],
       [ 1, 16,  9, 12]], dtype=uint8)  # random

References

laplace(loc=0.0, scale=1.0, size=None)

Draw samples from the Laplace or double exponential distribution with specified location (or mean) and scale (decay).

The Laplace distribution is similar to the Gaussian/normal distribution, but is sharper at the peak and has fatter tails. It represents the difference between two independent, identically distributed exponential random variables.

Parameters:
  • loc (float or array_like of floats, optional) – The position, \(\mu\), of the distribution peak. Default is 0.

  • scale (float or array_like of floats, optional) – \(\lambda\), the exponential decay. Default is 1. Must be non- negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Laplace distribution.

Return type:

ndarray or scalar

Notes

It has the probability density function

\[f(x; \mu, \lambda) = \frac{1}{2\lambda} \exp\left(-\frac{|x - \mu|}{\lambda}\right).\]

The first law of Laplace, from 1774, states that the frequency of an error can be expressed as an exponential function of the absolute magnitude of the error, which leads to the Laplace distribution. For many problems in economics and health sciences, this distribution seems to model the data better than the standard Gaussian distribution.

References

Examples

Draw samples from the distribution

>>> loc, scale = 0., 1.
>>> rng = np.random.default_rng()
>>> s = rng.laplace(loc, scale, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> x = np.arange(-8., 8., .01)
>>> pdf = np.exp(-abs(x-loc)/scale)/(2.*scale)
>>> plt.plot(x, pdf)

Plot Gaussian for comparison:

>>> g = (1/(scale * np.sqrt(2 * np.pi)) *
...      np.exp(-(x - loc)**2 / (2 * scale**2)))
>>> plt.plot(x,g)
logistic(loc=0.0, scale=1.0, size=None)

Draw samples from a logistic distribution.

Samples are drawn from a logistic distribution with specified parameters, loc (location or mean, also median), and scale (>0).

Parameters:
  • loc (float or array_like of floats, optional) – Parameter of the distribution. Default is 0.

  • scale (float or array_like of floats, optional) – Parameter of the distribution. Must be non-negative. Default is 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized logistic distribution.

Return type:

ndarray or scalar

See also

scipy.stats.logistic

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Logistic distribution is

\[P(x) = \frac{e^{-(x-\mu)/s}}{s(1+e^{-(x-\mu)/s})^2},\]

where \(\mu\) = location and \(s\) = scale.

The Logistic distribution is used in Extreme Value problems where it can act as a mixture of Gumbel distributions, in Epidemiology, and by the World Chess Federation (FIDE) where it is used in the Elo ranking system, assuming the performance of each player is a logistically distributed random variable.

References

Examples

Draw samples from the distribution:

>>> loc, scale = 10, 1
>>> rng = np.random.default_rng()
>>> s = rng.logistic(loc, scale, 10000)
>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, bins=50, label='Sampled data')

# plot sampled data against the exact distribution

>>> def logistic(x, loc, scale):
...     return np.exp((loc-x)/scale)/(scale*(1+np.exp((loc-x)/scale))**2)
>>> logistic_values  = logistic(bins, loc, scale)
>>> bin_spacing = np.mean(np.diff(bins))
>>> plt.plot(bins, logistic_values  * bin_spacing * s.size, label='Logistic PDF')
>>> plt.legend()
>>> plt.show()
lognormal(mean=0.0, sigma=1.0, size=None)

Draw samples from a log-normal distribution.

Draw samples from a log-normal distribution with specified mean, standard deviation, and array shape. Note that the mean and standard deviation are not the values for the distribution itself, but of the underlying normal distribution it is derived from.

Parameters:
  • mean (float or array_like of floats, optional) – Mean value of the underlying normal distribution. Default is 0.

  • sigma (float or array_like of floats, optional) – Standard deviation of the underlying normal distribution. Must be non-negative. Default is 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if mean and sigma are both scalars. Otherwise, np.broadcast(mean, sigma).size samples are drawn.

Returns:

out – Drawn samples from the parameterized log-normal distribution.

Return type:

ndarray or scalar

See also

scipy.stats.lognorm

probability density function, distribution, cumulative density function, etc.

Notes

A variable x has a log-normal distribution if log(x) is normally distributed. The probability density function for the log-normal distribution is:

\[p(x) = \frac{1}{\sigma x \sqrt{2\pi}} e^{(-\frac{(ln(x)-\mu)^2}{2\sigma^2})}\]

where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the normally distributed logarithm of the variable. A log-normal distribution results if a random variable is the product of a large number of independent, identically-distributed variables in the same way that a normal distribution results if the variable is the sum of a large number of independent, identically-distributed variables.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> mu, sigma = 3., 1. # mean and standard deviation
>>> s = rng.lognormal(mu, sigma, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 100, density=True, align='mid')
>>> x = np.linspace(min(bins), max(bins), 10000)
>>> pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))
...        / (x * sigma * np.sqrt(2 * np.pi)))
>>> plt.plot(x, pdf, linewidth=2, color='r')
>>> plt.axis('tight')
>>> plt.show()

Demonstrate that taking the products of random samples from a uniform distribution can be fit well by a log-normal probability density function.

>>> # Generate a thousand samples: each is the product of 100 random
>>> # values, drawn from a normal distribution.
>>> rng = rng
>>> b = []
>>> for i in range(1000):
...    a = 10. + rng.standard_normal(100)
...    b.append(np.prod(a))
>>> b = np.array(b) / np.min(b) # scale values to be positive
>>> count, bins, _ = plt.hist(b, 100, density=True, align='mid')
>>> sigma = np.std(np.log(b))
>>> mu = np.mean(np.log(b))
>>> x = np.linspace(min(bins), max(bins), 10000)
>>> pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))
...        / (x * sigma * np.sqrt(2 * np.pi)))
>>> plt.plot(x, pdf, color='r', linewidth=2)
>>> plt.show()
logseries(p, size=None)

Draw samples from a logarithmic series distribution.

Samples are drawn from a log series distribution with specified shape parameter, 0 <= p < 1.

Parameters:
  • p (float or array_like of floats) – Shape parameter for the distribution. Must be in the range [0, 1).

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if p is a scalar. Otherwise, np.array(p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized logarithmic series distribution.

Return type:

ndarray or scalar

See also

scipy.stats.logser

probability density function, distribution or cumulative density function, etc.

Notes

The probability mass function for the Log Series distribution is

\[P(k) = \frac{-p^k}{k \ln(1-p)},\]

where p = probability.

The log series distribution is frequently used to represent species richness and occurrence, first proposed by Fisher, Corbet, and Williams in 1943 [2]. It may also be used to model the numbers of occupants seen in cars [3].

References

Examples

Draw samples from the distribution:

>>> a = .6
>>> rng = np.random.default_rng()
>>> s = rng.logseries(a, 10000)
>>> import matplotlib.pyplot as plt
>>> bins = np.arange(-.5, max(s) + .5 )
>>> count, bins, _ = plt.hist(s, bins=bins, label='Sample count')

# plot against distribution

>>> def logseries(k, p):
...     return -p**k/(k*np.log(1-p))
>>> centres = np.arange(1, max(s) + 1)
>>> plt.plot(centres, logseries(centres, a) * s.size, 'r', label='logseries PMF')
>>> plt.legend()
>>> plt.show()
multinomial(n, pvals, size=None)

Draw samples from a multinomial distribution.

The multinomial distribution is a multivariate generalization of the binomial distribution. Take an experiment with one of p possible outcomes. An example of such an experiment is throwing a dice, where the outcome can be 1 through 6. Each sample drawn from the distribution represents n such experiments. Its values, X_i = [X_0, X_1, ..., X_p], represent the number of times the outcome was i.

Parameters:
  • n (int or array-like of ints) – Number of experiments.

  • pvals (array-like of floats) – Probabilities of each of the p different outcomes with shape (k0, k1, ..., kn, p). Each element pvals[i,j,...,:] must sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[..., :-1], axis=-1) <= 1.0. Must have at least 1 dimension where pvals.shape[-1] > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn each with p elements. Default is None where the output size is determined by the broadcast shape of n and all by the final dimension of pvals, which is denoted as b=(b0, b1, ..., bq). If size is not None, then it must be compatible with the broadcast shape b. Specifically, size must have q or more elements and size[-(q-j):] must equal bj.

Returns:

out – The drawn samples, of shape size, if provided. When size is provided, the output shape is size + (p,) If not specified, the shape is determined by the broadcast shape of n and pvals, (b0, b1, ..., bq) augmented with the dimension of the multinomial, p, so that that output shape is (b0, b1, ..., bq, p).

Each entry out[i,j,...,:] is a p-dimensional value drawn from the distribution.

Return type:

ndarray

Examples

Throw a dice 20 times:

>>> rng = np.random.default_rng()
>>> rng.multinomial(20, [1/6.]*6, size=1)
array([[4, 1, 7, 5, 2, 1]])  # random

It landed 4 times on 1, once on 2, etc.

Now, throw the dice 20 times, and 20 times again:

>>> rng.multinomial(20, [1/6.]*6, size=2)
array([[3, 4, 3, 3, 4, 3],
       [2, 4, 3, 4, 0, 7]])  # random

For the first run, we threw 3 times 1, 4 times 2, etc. For the second, we threw 2 times 1, 4 times 2, etc.

Now, do one experiment throwing the dice 10 time, and 10 times again, and another throwing the dice 20 times, and 20 times again:

>>> rng.multinomial([[10], [20]], [1/6.]*6, size=(2, 2))
array([[[2, 4, 0, 1, 2, 1],
        [1, 3, 0, 3, 1, 2]],
       [[1, 4, 4, 4, 4, 3],
        [3, 3, 2, 5, 5, 2]]])  # random

The first array shows the outcomes of throwing the dice 10 times, and the second shows the outcomes from throwing the dice 20 times.

A loaded die is more likely to land on number 6:

>>> rng.multinomial(100, [1/7.]*5 + [2/7.])
array([11, 16, 14, 17, 16, 26])  # random

Simulate 10 throws of a 4-sided die and 20 throws of a 6-sided die

>>> rng.multinomial([10, 20],[[1/4]*4 + [0]*2, [1/6]*6])
array([[2, 1, 4, 3, 0, 0],
       [3, 3, 3, 6, 1, 4]], dtype=int64)  # random

Generate categorical random variates from two categories where the first has 3 outcomes and the second has 2.

>>> rng.multinomial(1, [[.1, .5, .4 ], [.3, .7, .0]])
array([[0, 0, 1],
       [0, 1, 0]], dtype=int64)  # random

argmax(axis=-1) is then used to return the categories.

>>> pvals = [[.1, .5, .4 ], [.3, .7, .0]]
>>> rvs = rng.multinomial(1, pvals, size=(4,2))
>>> rvs.argmax(axis=-1)
array([[0, 1],
       [2, 0],
       [2, 1],
       [2, 0]], dtype=int64)  # random

The same output dimension can be produced using broadcasting.

>>> rvs = rng.multinomial([[1]] * 4, pvals)
>>> rvs.argmax(axis=-1)
array([[0, 1],
       [2, 0],
       [2, 1],
       [2, 0]], dtype=int64)  # random

The probability inputs should be normalized. As an implementation detail, the value of the last entry is ignored and assumed to take up any leftover probability mass, but this should not be relied on. A biased coin which has twice as much weight on one side as on the other should be sampled like so:

>>> rng.multinomial(100, [1.0 / 3, 2.0 / 3])  # RIGHT
array([38, 62])  # random

not like:

>>> rng.multinomial(100, [1.0, 2.0])  # WRONG
Traceback (most recent call last):
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs
multivariate_hypergeometric(colors, nsample, size=None, method='marginals')
multivariate_hypergeometric(colors, nsample, size=None,

method=’marginals’)

Generate variates from a multivariate hypergeometric distribution.

The multivariate hypergeometric distribution is a generalization of the hypergeometric distribution.

Choose nsample items at random without replacement from a collection with N distinct types. N is the length of colors, and the values in colors are the number of occurrences of that type in the collection. The total number of items in the collection is sum(colors). Each random variate generated by this function is a vector of length N holding the counts of the different types that occurred in the nsample items.

The name colors comes from a common description of the distribution: it is the probability distribution of the number of marbles of each color selected without replacement from an urn containing marbles of different colors; colors[i] is the number of marbles in the urn with color i.

Parameters:
  • colors (sequence of integers) – The number of each type of item in the collection from which a sample is drawn. The values in colors must be nonnegative. To avoid loss of precision in the algorithm, sum(colors) must be less than 10**9 when method is “marginals”.

  • nsample (int) – The number of items selected. nsample must not be greater than sum(colors).

  • size (int or tuple of ints, optional) – The number of variates to generate, either an integer or a tuple holding the shape of the array of variates. If the given size is, e.g., (k, m), then k * m variates are drawn, where one variate is a vector of length len(colors), and the return value has shape (k, m, len(colors)). If size is an integer, the output has shape (size, len(colors)). Default is None, in which case a single variate is returned as an array with shape (len(colors),).

  • method (string, optional) – Specify the algorithm that is used to generate the variates. Must be ‘count’ or ‘marginals’ (the default). See the Notes for a description of the methods.

Returns:

variates – Array of variates drawn from the multivariate hypergeometric distribution.

Return type:

ndarray

See also

hypergeometric

Draw samples from the (univariate) hypergeometric distribution.

Notes

The two methods do not return the same sequence of variates.

The “count” algorithm is roughly equivalent to the following numpy code:

choices = np.repeat(np.arange(len(colors)), colors)
selection = np.random.choice(choices, nsample, replace=False)
variate = np.bincount(selection, minlength=len(colors))

The “count” algorithm uses a temporary array of integers with length sum(colors).

The “marginals” algorithm generates a variate by using repeated calls to the univariate hypergeometric sampler. It is roughly equivalent to:

variate = np.zeros(len(colors), dtype=np.int64)
# `remaining` is the cumulative sum of `colors` from the last
# element to the first; e.g. if `colors` is [3, 1, 5], then
# `remaining` is [9, 6, 5].
remaining = np.cumsum(colors[::-1])[::-1]
for i in range(len(colors)-1):
    if nsample < 1:
        break
    variate[i] = hypergeometric(colors[i], remaining[i+1],
                               nsample)
    nsample -= variate[i]
variate[-1] = nsample

The default method is “marginals”. For some cases (e.g. when colors contains relatively small integers), the “count” method can be significantly faster than the “marginals” method. If performance of the algorithm is important, test the two methods with typical inputs to decide which works best.

Examples

>>> colors = [16, 8, 4]
>>> seed = 4861946401452
>>> gen = np.random.Generator(np.random.PCG64(seed))
>>> gen.multivariate_hypergeometric(colors, 6)
array([5, 0, 1])
>>> gen.multivariate_hypergeometric(colors, 6, size=3)
array([[5, 0, 1],
       [2, 2, 2],
       [3, 3, 0]])
>>> gen.multivariate_hypergeometric(colors, 6, size=(2, 2))
array([[[3, 2, 1],
        [3, 2, 1]],
       [[4, 1, 1],
        [3, 2, 1]]])
multivariate_normal(mean, cov, size=None, check_valid='warn', tol=1e-08, *, method='svd')
multivariate_normal(mean, cov, size=None, check_valid=’warn’,

tol=1e-8, *, method=’svd’)

Draw random samples from a multivariate normal distribution.

The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (the squared standard deviation, or “width”) of the one-dimensional normal distribution.

Parameters:
  • mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.

  • cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.

  • size (int or tuple of ints, optional) – Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned.

  • check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.

  • tol (float, optional) – Tolerance when checking the singular values in covariance matrix. cov is cast to double before the check.

  • method ({ 'svd', 'eigh', 'cholesky'}, optional) – The cov input is used to compute a factor matrix A such that A @ A.T = cov. This argument is used to select the method used to compute the factor matrix A. The default method ‘svd’ is the slowest, while ‘cholesky’ is the fastest but less robust than the slowest method. The method eigh uses eigen decomposition to compute A and is faster than svd but slower than cholesky.

Returns:

out – The drawn samples, of shape size, if that was provided. If not, the shape is (N,).

In other words, each entry out[i,j,...,:] is an N-dimensional value drawn from the distribution.

Return type:

ndarray

Notes

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.

Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, ... x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (cov is a multiple of the identity matrix)

  • Diagonal covariance (cov has non-negative elements, and only on the diagonal)

This geometrical property can be seen in two dimensions by plotting generated data-points:

>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]]  # diagonal covariance

Diagonal covariance means that points are oriented along x or y-axis:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> x, y = rng.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, 'x')
>>> plt.axis('equal')
>>> plt.show()

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

This function internally uses linear algebra routines, and thus results may not be identical (even up to precision) across architectures, OSes, or even builds. For example, this is likely if cov has multiple equal singular values and method is 'svd' (default). In this case, method='cholesky' may be more robust.

References

Examples

>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> rng = np.random.default_rng()
>>> x = rng.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)

We can use a different method other than the default to factorize cov:

>>> y = rng.multivariate_normal(mean, cov, (3, 3), method='cholesky')
>>> y.shape
(3, 3, 2)

Here we generate 800 samples from the bivariate normal distribution with mean [0, 0] and covariance matrix [[6, -3], [-3, 3.5]]. The expected variances of the first and second components of the sample are 6 and 3.5, respectively, and the expected correlation coefficient is -3/sqrt(6*3.5) ≈ -0.65465.

>>> cov = np.array([[6, -3], [-3, 3.5]])
>>> pts = rng.multivariate_normal([0, 0], cov, size=800)

Check that the mean, covariance, and correlation coefficient of the sample are close to the expected values:

>>> pts.mean(axis=0)
array([ 0.0326911 , -0.01280782])  # may vary
>>> np.cov(pts.T)
array([[ 5.96202397, -2.85602287],
       [-2.85602287,  3.47613949]])  # may vary
>>> np.corrcoef(pts.T)[0, 1]
-0.6273591314603949  # may vary

We can visualize this data with a scatter plot. The orientation of the point cloud illustrates the negative correlation of the components of this sample.

>>> import matplotlib.pyplot as plt
>>> plt.plot(pts[:, 0], pts[:, 1], '.', alpha=0.5)
>>> plt.axis('equal')
>>> plt.grid()
>>> plt.show()
negative_binomial(n, p, size=None)

Draw samples from a negative binomial distribution.

Samples are drawn from a negative binomial distribution with specified parameters, n successes and p probability of success where n is > 0 and p is in the interval (0, 1].

Parameters:
  • n (float or array_like of floats) – Parameter of the distribution, > 0.

  • p (float or array_like of floats) – Parameter of the distribution. Must satisfy 0 < p <= 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if n and p are both scalars. Otherwise, np.broadcast(n, p).size samples are drawn.

Returns:

out – Drawn samples from the parameterized negative binomial distribution, where each sample is equal to N, the number of failures that occurred before a total of n successes was reached.

Return type:

ndarray or scalar

Notes

The probability mass function of the negative binomial distribution is

\[P(N;n,p) = \frac{\Gamma(N+n)}{N!\Gamma(n)}p^{n}(1-p)^{N},\]

where \(n\) is the number of successes, \(p\) is the probability of success, \(N+n\) is the number of trials, and \(\Gamma\) is the gamma function. When \(n\) is an integer, \(\frac{\Gamma(N+n)}{N!\Gamma(n)} = \binom{N+n-1}{N}\), which is the more common form of this term in the pmf. The negative binomial distribution gives the probability of N failures given n successes, with a success on the last trial.

If one throws a die repeatedly until the third time a “1” appears, then the probability distribution of the number of non-“1”s that appear before the third “1” is a negative binomial distribution.

Because this method internally calls Generator.poisson with an intermediate random value, a ValueError is raised when the choice of \(n\) and \(p\) would result in the mean + 10 sigma of the sampled intermediate distribution exceeding the max acceptable value of the Generator.poisson method. This happens when \(p\) is too low (a lot of failures happen for every success) and \(n\) is too big ( a lot of successes are allowed). Therefore, the \(n\) and \(p\) values must satisfy the constraint:

\[n\frac{1-p}{p}+10n\sqrt{n}\frac{1-p}{p}<2^{63}-1-10\sqrt{2^{63}-1},\]

Where the left side of the equation is the derived mean + 10 sigma of a sample from the gamma distribution internally used as the \(lam\) parameter of a poisson sample, and the right side of the equation is the constraint for maximum value of \(lam\) in Generator.poisson.

References

Examples

Draw samples from the distribution:

A real world example. A company drills wild-cat oil exploration wells, each with an estimated probability of success of 0.1. What is the probability of having one success for each successive well, that is what is the probability of a single success after drilling 5 wells, after 6 wells, etc.?

>>> rng = np.random.default_rng()
>>> s = rng.negative_binomial(1, 0.1, 100000)
>>> for i in range(1, 11):
...    probability = sum(s<i) / 100000.
...    print(i, "wells drilled, probability of one success =", probability)
noncentral_chisquare(df, nonc, size=None)

Draw samples from a noncentral chi-square distribution.

The noncentral \(\chi^2\) distribution is a generalization of the \(\chi^2\) distribution.

Parameters:
  • df (float or array_like of floats) – Degrees of freedom, must be > 0.

  • nonc (float or array_like of floats) – Non-centrality, must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if df and nonc are both scalars. Otherwise, np.broadcast(df, nonc).size samples are drawn.

Returns:

out – Drawn samples from the parameterized noncentral chi-square distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the noncentral Chi-square distribution is

\[P(x;df,nonc) = \sum^{\infty}_{i=0} \frac{e^{-nonc/2}(nonc/2)^{i}}{i!} P_{Y_{df+2i}}(x),\]

where \(Y_{q}\) is the Chi-square with q degrees of freedom.

References

Examples

Draw values from the distribution and plot the histogram

>>> rng = np.random.default_rng()
>>> import matplotlib.pyplot as plt
>>> values = plt.hist(rng.noncentral_chisquare(3, 20, 100000),
...                   bins=200, density=True)
>>> plt.show()

Draw values from a noncentral chisquare with very small noncentrality, and compare to a chisquare.

>>> plt.figure()
>>> values = plt.hist(rng.noncentral_chisquare(3, .0000001, 100000),
...                   bins=np.arange(0., 25, .1), density=True)
>>> values2 = plt.hist(rng.chisquare(3, 100000),
...                    bins=np.arange(0., 25, .1), density=True)
>>> plt.plot(values[1][0:-1], values[0]-values2[0], 'ob')
>>> plt.show()

Demonstrate how large values of non-centrality lead to a more symmetric distribution.

>>> plt.figure()
>>> values = plt.hist(rng.noncentral_chisquare(3, 20, 100000),
...                   bins=200, density=True)
>>> plt.show()
noncentral_f(dfnum, dfden, nonc, size=None)

Draw samples from the noncentral F distribution.

Samples are drawn from an F distribution with specified parameters, dfnum (degrees of freedom in numerator) and dfden (degrees of freedom in denominator), where both parameters > 1. nonc is the non-centrality parameter.

Parameters:
  • dfnum (float or array_like of floats) – Numerator degrees of freedom, must be > 0.

  • dfden (float or array_like of floats) – Denominator degrees of freedom, must be > 0.

  • nonc (float or array_like of floats) – Non-centrality parameter, the sum of the squares of the numerator means, must be >= 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if dfnum, dfden, and nonc are all scalars. Otherwise, np.broadcast(dfnum, dfden, nonc).size samples are drawn.

Returns:

out – Drawn samples from the parameterized noncentral Fisher distribution.

Return type:

ndarray or scalar

Notes

When calculating the power of an experiment (power = probability of rejecting the null hypothesis when a specific alternative is true) the non-central F statistic becomes important. When the null hypothesis is true, the F statistic follows a central F distribution. When the null hypothesis is not true, then it follows a non-central F statistic.

References

Examples

In a study, testing for a specific alternative to the null hypothesis requires use of the Noncentral F distribution. We need to calculate the area in the tail of the distribution that exceeds the value of the F distribution for the null hypothesis. We’ll plot the two probability distributions for comparison.

>>> rng = np.random.default_rng()
>>> dfnum = 3 # between group deg of freedom
>>> dfden = 20 # within groups degrees of freedom
>>> nonc = 3.0
>>> nc_vals = rng.noncentral_f(dfnum, dfden, nonc, 1000000)
>>> NF = np.histogram(nc_vals, bins=50, density=True)
>>> c_vals = rng.f(dfnum, dfden, 1000000)
>>> F = np.histogram(c_vals, bins=50, density=True)
>>> import matplotlib.pyplot as plt
>>> plt.plot(F[1][1:], F[0])
>>> plt.plot(NF[1][1:], NF[0])
>>> plt.show()
normal(loc=0.0, scale=1.0, size=None)

Draw random samples from a normal (Gaussian) distribution.

The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently [2]_, is often called the bell curve because of its characteristic shape (see the example below).

The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution [2]_.

Parameters:
  • loc (float or array_like of floats) – Mean (“centre”) of the distribution.

  • scale (float or array_like of floats) – Standard deviation (spread or “width”) of the distribution. Must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(loc, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized normal distribution.

Return type:

ndarray or scalar

See also

scipy.stats.norm

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gaussian distribution is

\[p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }} e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },\]

where \(\mu\) is the mean and \(\sigma\) the standard deviation. The square of the standard deviation, \(\sigma^2\), is called the variance.

The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at \(x + \sigma\) and \(x - \sigma\) [2]_). This implies that normal() is more likely to return samples lying close to the mean, rather than those far away.

References

Examples

Draw samples from the distribution:

>>> mu, sigma = 0, 0.1 # mean and standard deviation
>>> rng = np.random.default_rng()
>>> s = rng.normal(mu, sigma, 1000)

Verify the mean and the standard deviation:

>>> abs(mu - np.mean(s))
0.0  # may vary
>>> abs(sigma - np.std(s, ddof=1))
0.0  # may vary

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 30, density=True)
>>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
...                np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
...          linewidth=2, color='r')
>>> plt.show()

Two-by-four array of samples from the normal distribution with mean 3 and standard deviation 2.5:

>>> rng = np.random.default_rng()
>>> rng.normal(3, 2.5, size=(2, 4))
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random
pareto(a, size=None)

Draw samples from a Pareto II (AKA Lomax) distribution with specified shape.

Parameters:
  • a (float or array_like of floats) – Shape of the distribution. Must be positive.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the Pareto II distribution.

Return type:

ndarray or scalar

See also

scipy.stats.pareto

Pareto I distribution

scipy.stats.lomax

Lomax (Pareto II) distribution

scipy.stats.genpareto

Generalized Pareto distribution

Notes

The probability density for the Pareto II distribution is

\[p(x) = \frac{a}{{x+1}^{a+1}} , x \ge 0\]

where \(a > 0\) is the shape.

The Pareto II distribution is a shifted and scaled version of the Pareto I distribution, which can be found in scipy.stats.pareto.

References

Examples

Draw samples from the distribution:

>>> a = 3.
>>> rng = np.random.default_rng()
>>> s = rng.pareto(a, 10000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> x = np.linspace(0, 3, 50)
>>> pdf = a / (x+1)**(a+1)
>>> plt.hist(s, bins=x, density=True, label='histogram')
>>> plt.plot(x, pdf, linewidth=2, color='r', label='pdf')
>>> plt.xlim(x.min(), x.max())
>>> plt.legend()
>>> plt.show()
permutation(x, axis=0)

Randomly permute a sequence, or return a permuted range.

Parameters:
  • x (int or array_like) – If x is an integer, randomly permute np.arange(x). If x is an array, make a copy and shuffle the elements randomly.

  • axis (int, optional) – The axis which x is shuffled along. Default is 0.

Returns:

out – Permuted sequence or array range.

Return type:

ndarray

Examples

>>> rng = np.random.default_rng()
>>> rng.permutation(10)
array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6]) # random
>>> rng.permutation([1, 4, 9, 12, 15])
array([15,  1,  9,  4, 12]) # random
>>> arr = np.arange(9).reshape((3, 3))
>>> rng.permutation(arr)
array([[6, 7, 8], # random
       [0, 1, 2],
       [3, 4, 5]])
>>> rng.permutation("abc")
Traceback (most recent call last):
    ...
numpy.exceptions.AxisError: axis 0 is out of bounds for array of dimension 0
>>> arr = np.arange(9).reshape((3, 3))
>>> rng.permutation(arr, axis=1)
array([[0, 2, 1], # random
       [3, 5, 4],
       [6, 8, 7]])
permuted(x, axis=None, out=None)

Randomly permute x along axis axis.

Unlike shuffle, each slice along the given axis is shuffled independently of the others.

Parameters:
  • x (array_like, at least one-dimensional) – Array to be shuffled.

  • axis (int, optional) – Slices of x in this axis are shuffled. Each slice is shuffled independently of the others. If axis is None, the flattened array is shuffled.

  • out (ndarray, optional) – If given, this is the destination of the shuffled array. If out is None, a shuffled copy of the array is returned.

Returns:

If out is None, a shuffled copy of x is returned. Otherwise, the shuffled array is stored in out, and out is returned

Return type:

ndarray

See also

shuffle, permutation

Notes

An important distinction between methods shuffle and permuted is how they both treat the axis parameter which can be found at generator-handling-axis-parameter.

Examples

Create a numpy.random.Generator instance:

>>> rng = np.random.default_rng()

Create a test array:

>>> x = np.arange(24).reshape(3, 8)
>>> x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

Shuffle the rows of x:

>>> y = rng.permuted(x, axis=1)
>>> y
array([[ 4,  3,  6,  7,  1,  2,  5,  0],  # random
       [15, 10, 14,  9, 12, 11,  8, 13],
       [17, 16, 20, 21, 18, 22, 23, 19]])

x has not been modified:

>>> x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

To shuffle the rows of x in-place, pass x as the out parameter:

>>> y = rng.permuted(x, axis=1, out=x)
>>> x
array([[ 3,  0,  4,  7,  1,  6,  2,  5],  # random
       [ 8, 14, 13,  9, 12, 11, 15, 10],
       [17, 18, 16, 22, 19, 23, 20, 21]])

Note that when the out parameter is given, the return value is out:

>>> y is x
True
poisson(lam=1.0, size=None)

Draw samples from a Poisson distribution.

The Poisson distribution is the limit of the binomial distribution for large N.

Parameters:
  • lam (float or array_like of floats) – Expected number of events occurring in a fixed-time interval, must be >= 0. A sequence must be broadcastable over the requested size.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if lam is a scalar. Otherwise, np.array(lam).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Poisson distribution.

Return type:

ndarray or scalar

Notes

The probability mass function (PMF) of Poisson distribution is

\[f(k; \lambda)=\frac{\lambda^k e^{-\lambda}}{k!}\]

For events with an expected separation \(\lambda\) the Poisson distribution \(f(k; \lambda)\) describes the probability of \(k\) events occurring within the observed interval \(\lambda\).

Because the output is limited to the range of the C int64 type, a ValueError is raised when lam is within 10 sigma of the maximum representable value.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> lam, size = 5, 10000
>>> s = rng.poisson(lam=lam, size=size)

Verify the mean and variance, which should be approximately lam:

>>> s.mean(), s.var()
(4.9917 5.1088311)  # may vary

Display the histogram and probability mass function:

>>> import matplotlib.pyplot as plt
>>> from scipy import stats
>>> x = np.arange(0, 21)
>>> pmf = stats.poisson.pmf(x, mu=lam)
>>> plt.hist(s, bins=x, density=True, width=0.5)
>>> plt.stem(x, pmf, 'C1-')
>>> plt.show()

Draw each 100 values for lambda 100 and 500:

>>> s = rng.poisson(lam=(100., 500.), size=(100, 2))
power(a, size=None)

Draws samples in [0, 1] from a power distribution with positive exponent a - 1.

Also known as the power function distribution.

Parameters:
  • a (float or array_like of floats) – Parameter of the distribution. Must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the parameterized power distribution.

Return type:

ndarray or scalar

Raises:

ValueError – If a <= 0.

Notes

The probability density function is

\[P(x; a) = ax^{a-1}, 0 \le x \le 1, a>0.\]

The power function distribution is just the inverse of the Pareto distribution. It may also be seen as a special case of the Beta distribution.

It is used, for example, in modeling the over-reporting of insurance claims.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> a = 5. # shape
>>> samples = 1000
>>> s = rng.power(a, samples)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, bins=30)
>>> x = np.linspace(0, 1, 100)
>>> y = a*x**(a-1.)
>>> normed_y = samples*np.diff(bins)[0]*y
>>> plt.plot(x, normed_y)
>>> plt.show()

Compare the power function distribution to the inverse of the Pareto.

>>> from scipy import stats
>>> rvs = rng.power(5, 1000000)
>>> rvsp = rng.pareto(5, 1000000)
>>> xx = np.linspace(0,1,100)
>>> powpdf = stats.powerlaw.pdf(xx,5)
>>> plt.figure()
>>> plt.hist(rvs, bins=50, density=True)
>>> plt.plot(xx,powpdf,'r-')
>>> plt.title('power(5)')
>>> plt.figure()
>>> plt.hist(1./(1.+rvsp), bins=50, density=True)
>>> plt.plot(xx,powpdf,'r-')
>>> plt.title('inverse of 1 + Generator.pareto(5)')
>>> plt.figure()
>>> plt.hist(1./(1.+rvsp), bins=50, density=True)
>>> plt.plot(xx,powpdf,'r-')
>>> plt.title('inverse of stats.pareto(5)')
random(size=None, dtype=np.float64, out=None)

Return random floats in the half-open interval [0.0, 1.0).

Results are from the “continuous uniform” distribution over the stated interval. To sample \(Unif[a, b), b > a\) use uniform or multiply the output of random by (b - a) and add a:

(b - a) * random() + a
Parameters:
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – Array of random floats of shape size (unless size=None, in which case a single float is returned).

Return type:

float or ndarray of floats

See also

uniform

Draw samples from the parameterized uniform distribution.

Examples

>>> rng = np.random.default_rng()
>>> rng.random()
0.47108547995356098 # random
>>> type(rng.random())
<class 'float'>
>>> rng.random((5,))
array([ 0.30220482,  0.86820401,  0.1654503 ,  0.11659149,  0.54323428]) # random

Three-by-two array of random numbers from [-5, 0):

>>> 5 * rng.random((3, 2)) - 5
array([[-3.99149989, -0.52338984], # random
       [-2.99091858, -0.79479508],
       [-1.23204345, -1.75224494]])
rayleigh(scale=1.0, size=None)

Draw samples from a Rayleigh distribution.

The \(\chi\) and Weibull distributions are generalizations of the Rayleigh.

Parameters:
  • scale (float or array_like of floats, optional) – Scale, also equals the mode. Must be non-negative. Default is 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if scale is a scalar. Otherwise, np.array(scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Rayleigh distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the Rayleigh distribution is

\[P(x;scale) = \frac{x}{scale^2}e^{\frac{-x^2}{2 \cdotp scale^2}}\]

The Rayleigh distribution would arise, for example, if the East and North components of the wind velocity had identical zero-mean Gaussian distributions. Then the wind speed would have a Rayleigh distribution.

References

Examples

Draw values from the distribution and plot the histogram

>>> from matplotlib.pyplot import hist
>>> rng = np.random.default_rng()
>>> values = hist(rng.rayleigh(3, 100000), bins=200, density=True)

Wave heights tend to follow a Rayleigh distribution. If the mean wave height is 1 meter, what fraction of waves are likely to be larger than 3 meters?

>>> meanvalue = 1
>>> modevalue = np.sqrt(2 / np.pi) * meanvalue
>>> s = rng.rayleigh(modevalue, 1000000)

The percentage of waves larger than 3 meters is:

>>> 100.*sum(s>3)/1000000.
0.087300000000000003 # random
shuffle(x, axis=0)

Modify an array or sequence in-place by shuffling its contents.

The order of sub-arrays is changed but their contents remains the same.

Parameters:
  • x (ndarray or MutableSequence) – The array, list or mutable sequence to be shuffled.

  • axis (int, optional) – The axis which x is shuffled along. Default is 0. It is only supported on ndarray objects.

Return type:

None

See also

permuted, permutation

Notes

An important distinction between methods shuffle and permuted is how they both treat the axis parameter which can be found at generator-handling-axis-parameter.

Examples

>>> rng = np.random.default_rng()
>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> rng.shuffle(arr)
>>> arr
array([2, 0, 7, 5, 1, 4, 8, 9, 3, 6]) # random
>>> arr = np.arange(9).reshape((3, 3))
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> rng.shuffle(arr)
>>> arr
array([[3, 4, 5], # random
       [6, 7, 8],
       [0, 1, 2]])
>>> arr = np.arange(9).reshape((3, 3))
>>> arr
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> rng.shuffle(arr, axis=1)
>>> arr
array([[2, 0, 1], # random
       [5, 3, 4],
       [8, 6, 7]])
spawn(n_children)

Create new independent child generators.

See seedsequence-spawn for additional notes on spawning children.

Added in version 1.25.0.

Parameters:

n_children (int)

Returns:

child_generators

Return type:

list of Generators

Raises:

TypeError – When the underlying SeedSequence does not implement spawning.

See also

random.BitGenerator.spawn, random.SeedSequence.spawn

bit_generator

The bit generator instance used by the generator.

Examples

Starting from a seeded default generator:

>>> # High quality entropy created with: f"0x{secrets.randbits(128):x}"
>>> entropy = 0x3034c61a9ae04ff8cb62ab8ec2c4b501
>>> rng = np.random.default_rng(entropy)

Create two new generators for example for parallel execution:

>>> child_rng1, child_rng2 = rng.spawn(2)

Drawn numbers from each are independent but derived from the initial seeding entropy:

>>> rng.uniform(), child_rng1.uniform(), child_rng2.uniform()
(0.19029263503854454, 0.9475673279178444, 0.4702687338396767)

It is safe to spawn additional children from the original rng or the children:

>>> more_child_rngs = rng.spawn(20)
>>> nested_spawn = child_rng1.spawn(20)
standard_cauchy(size=None)

Draw samples from a standard Cauchy distribution with mode = 0.

Also known as the Lorentz distribution.

Parameters:

size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

Returns:

samples – The drawn samples.

Return type:

ndarray or scalar

Notes

The probability density function for the full Cauchy distribution is

\[P(x; x_0, \gamma) = \frac{1}{\pi \gamma \bigl[ 1+ (\frac{x-x_0}{\gamma})^2 \bigr] }\]

and the Standard Cauchy distribution just sets \(x_0=0\) and \(\gamma=1\)

The Cauchy distribution arises in the solution to the driven harmonic oscillator problem, and also describes spectral line broadening. It also describes the distribution of values at which a line tilted at a random angle will cut the x axis.

When studying hypothesis tests that assume normality, seeing how the tests perform on data from a Cauchy distribution is a good indicator of their sensitivity to a heavy-tailed distribution, since the Cauchy looks very much like a Gaussian distribution, but with heavier tails.

References

Examples

Draw samples and plot the distribution:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> s = rng.standard_cauchy(1000000)
>>> s = s[(s>-25) & (s<25)]  # truncate distribution so it plots well
>>> plt.hist(s, bins=100)
>>> plt.show()
standard_exponential(size=None, dtype=np.float64, method='zig', out=None)

Draw samples from the standard exponential distribution.

standard_exponential is identical to the exponential distribution with a scale parameter of 1.

Parameters:
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • method (str, optional) – Either ‘inv’ or ‘zig’. ‘inv’ uses the default inverse CDF method. ‘zig’ uses the much faster Ziggurat method of Marsaglia and Tsang.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – Drawn samples.

Return type:

float or ndarray

Examples

Output a 3x8000 array:

>>> rng = np.random.default_rng()
>>> n = rng.standard_exponential((3, 8000))
standard_gamma(shape, size=None, dtype=np.float64, out=None)

Draw samples from a standard Gamma distribution.

Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated “k”) and scale=1.

Parameters:
  • shape (float or array_like of floats) – Parameter, must be non-negative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if shape is a scalar. Otherwise, np.array(shape).size samples are drawn.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – Drawn samples from the parameterized standard gamma distribution.

Return type:

ndarray or scalar

See also

scipy.stats.gamma

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gamma distribution is

\[p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},\]

where \(k\) is the shape and \(\theta\) the scale, and \(\Gamma\) is the Gamma function.

The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

References

Examples

Draw samples from the distribution:

>>> shape, scale = 2., 1. # mean and width
>>> rng = np.random.default_rng()
>>> s = rng.standard_gamma(shape, 1000000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> import scipy.special as sps
>>> count, bins, _ = plt.hist(s, 50, density=True)
>>> y = bins**(shape-1) * ((np.exp(-bins/scale))/
...                       (sps.gamma(shape) * scale**shape))
>>> plt.plot(bins, y, linewidth=2, color='r')
>>> plt.show()
standard_normal(size=None, dtype=np.float64, out=None)

Draw samples from a standard Normal distribution (mean=0, stdev=1).

Parameters:
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

  • dtype (dtype, optional) – Desired dtype of the result, only float64 and float32 are supported. Byteorder must be native. The default value is np.float64.

  • out (ndarray, optional) – Alternative output array in which to place the result. If size is not None, it must have the same shape as the provided size and must match the type of the output values.

Returns:

out – A floating-point array of shape size of drawn samples, or a single sample if size was not specified.

Return type:

float or ndarray

See also

normal

Equivalent function with additional loc and scale arguments for setting the mean and standard deviation.

Notes

For random samples from the normal distribution with mean mu and standard deviation sigma, use one of:

mu + sigma * rng.standard_normal(size=...)
rng.normal(mu, sigma, size=...)

Examples

>>> rng = np.random.default_rng()
>>> rng.standard_normal()
2.1923875335537315 # random
>>> s = rng.standard_normal(8000)
>>> s
array([ 0.6888893 ,  0.78096262, -0.89086505, ...,  0.49876311,  # random
       -0.38672696, -0.4685006 ])                                # random
>>> s.shape
(8000,)
>>> s = rng.standard_normal(size=(3, 4, 2))
>>> s.shape
(3, 4, 2)

Two-by-four array of samples from the normal distribution with mean 3 and standard deviation 2.5:

>>> 3 + 2.5 * rng.standard_normal(size=(2, 4))
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random
standard_t(df, size=None)

Draw samples from a standard Student’s t distribution with df degrees of freedom.

A special case of the hyperbolic distribution. As df gets large, the result resembles that of the standard normal distribution (standard_normal).

Parameters:
  • df (float or array_like of floats) – Degrees of freedom, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if df is a scalar. Otherwise, np.array(df).size samples are drawn.

Returns:

out – Drawn samples from the parameterized standard Student’s t distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the t distribution is

\[P(x, df) = \frac{\Gamma(\frac{df+1}{2})}{\sqrt{\pi df} \Gamma(\frac{df}{2})}\Bigl( 1+\frac{x^2}{df} \Bigr)^{-(df+1)/2}\]

The t test is based on an assumption that the data come from a Normal distribution. The t test provides a way to test whether the sample mean (that is the mean calculated from the data) is a good estimate of the true mean.

The derivation of the t-distribution was first published in 1908 by William Gosset while working for the Guinness Brewery in Dublin. Due to proprietary issues, he had to publish under a pseudonym, and so he used the name Student.

References

Examples

From Dalgaard page 83 [1]_, suppose the daily energy intake for 11 women in kilojoules (kJ) is:

>>> intake = np.array([5260., 5470, 5640, 6180, 6390, 6515, 6805, 7515, \
...                    7515, 8230, 8770])

Does their energy intake deviate systematically from the recommended value of 7725 kJ? Our null hypothesis will be the absence of deviation, and the alternate hypothesis will be the presence of an effect that could be either positive or negative, hence making our test 2-tailed.

Because we are estimating the mean and we have N=11 values in our sample, we have N-1=10 degrees of freedom. We set our significance level to 95% and compute the t statistic using the empirical mean and empirical standard deviation of our intake. We use a ddof of 1 to base the computation of our empirical standard deviation on an unbiased estimate of the variance (note: the final estimate is not unbiased due to the concave nature of the square root).

>>> np.mean(intake)
6753.636363636364
>>> intake.std(ddof=1)
1142.1232221373727
>>> t = (np.mean(intake)-7725)/(intake.std(ddof=1)/np.sqrt(len(intake)))
>>> t
-2.8207540608310198

We draw 1000000 samples from Student’s t distribution with the adequate degrees of freedom.

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> s = rng.standard_t(10, size=1000000)
>>> h = plt.hist(s, bins=100, density=True)

Does our t statistic land in one of the two critical regions found at both tails of the distribution?

>>> np.sum(np.abs(t) < np.abs(s)) / float(len(s))
0.018318  #random < 0.05, statistic is in critical region

The probability value for this 2-tailed test is about 1.83%, which is lower than the 5% pre-determined significance threshold.

Therefore, the probability of observing values as extreme as our intake conditionally on the null hypothesis being true is too low, and we reject the null hypothesis of no deviation.

triangular(left, mode, right, size=None)

Draw samples from the triangular distribution over the interval [left, right].

The triangular distribution is a continuous probability distribution with lower limit left, peak at mode, and upper limit right. Unlike the other distributions, these parameters directly define the shape of the pdf.

Parameters:
  • left (float or array_like of floats) – Lower limit.

  • mode (float or array_like of floats) – The value where the peak of the distribution occurs. The value must fulfill the condition left <= mode <= right.

  • right (float or array_like of floats) – Upper limit, must be larger than left.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if left, mode, and right are all scalars. Otherwise, np.broadcast(left, mode, right).size samples are drawn.

Returns:

out – Drawn samples from the parameterized triangular distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the triangular distribution is

\[\begin{split}P(x;l, m, r) = \begin{cases} \frac{2(x-l)}{(r-l)(m-l)}& \text{for $l \leq x \leq m$},\\ \frac{2(r-x)}{(r-l)(r-m)}& \text{for $m \leq x \leq r$},\\ 0& \text{otherwise}. \end{cases}\end{split}\]

The triangular distribution is often used in ill-defined problems where the underlying distribution is not known, but some knowledge of the limits and mode exists. Often it is used in simulations.

References

Examples

Draw values from the distribution and plot the histogram:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> h = plt.hist(rng.triangular(-3, 0, 8, 100000), bins=200,
...              density=True)
>>> plt.show()
uniform(low=0.0, high=1.0, size=None)

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

Parameters:
  • low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.

  • high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than high. The high limit may be included in the returned array of floats due to floating-point rounding in the equation low + (high-low) * random_sample(). high - low must be non-negative. The default value is 1.0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if low and high are both scalars. Otherwise, np.broadcast(low, high).size samples are drawn.

Returns:

out – Drawn samples from the parameterized uniform distribution.

Return type:

ndarray or scalar

See also

integers

Discrete uniform distribution, yielding integers.

random

Floats uniformly distributed over [0, 1).

Notes

The probability density function of the uniform distribution is

\[p(x) = \frac{1}{b - a}\]

anywhere within the interval [a, b), and zero elsewhere.

When high == low, values of low will be returned.

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> s = rng.uniform(-1,0,1000)

All values are within the given interval:

>>> np.all(s >= -1)
True
>>> np.all(s < 0)
True

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s, 15, density=True)
>>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
>>> plt.show()
vonmises(mu, kappa, size=None)

Draw samples from a von Mises distribution.

Samples are drawn from a von Mises distribution with specified mode (mu) and concentration (kappa), on the interval [-pi, pi].

The von Mises distribution (also known as the circular normal distribution) is a continuous probability distribution on the unit circle. It may be thought of as the circular analogue of the normal distribution.

Parameters:
  • mu (float or array_like of floats) – Mode (“center”) of the distribution.

  • kappa (float or array_like of floats) – Concentration of the distribution, has to be >=0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if mu and kappa are both scalars. Otherwise, np.broadcast(mu, kappa).size samples are drawn.

Returns:

out – Drawn samples from the parameterized von Mises distribution.

Return type:

ndarray or scalar

See also

scipy.stats.vonmises

probability density function, distribution, or cumulative density function, etc.

Notes

The probability density for the von Mises distribution is

\[p(x) = \frac{e^{\kappa cos(x-\mu)}}{2\pi I_0(\kappa)},\]

where \(\mu\) is the mode and \(\kappa\) the concentration, and \(I_0(\kappa)\) is the modified Bessel function of order 0.

The von Mises is named for Richard Edler von Mises, who was born in Austria-Hungary, in what is now the Ukraine. He fled to the United States in 1939 and became a professor at Harvard. He worked in probability theory, aerodynamics, fluid mechanics, and philosophy of science.

References

Examples

Draw samples from the distribution:

>>> mu, kappa = 0.0, 4.0 # mean and concentration
>>> rng = np.random.default_rng()
>>> s = rng.vonmises(mu, kappa, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> from scipy.special import i0
>>> plt.hist(s, 50, density=True)
>>> x = np.linspace(-np.pi, np.pi, num=51)
>>> y = np.exp(kappa*np.cos(x-mu))/(2*np.pi*i0(kappa))
>>> plt.plot(x, y, linewidth=2, color='r')
>>> plt.show()
wald(mean, scale, size=None)

Draw samples from a Wald, or inverse Gaussian, distribution.

As the scale approaches infinity, the distribution becomes more like a Gaussian. Some references claim that the Wald is an inverse Gaussian with mean equal to 1, but this is by no means universal.

The inverse Gaussian distribution was first studied in relationship to Brownian motion. In 1956 M.C.K. Tweedie used the name inverse Gaussian because there is an inverse relationship between the time to cover a unit distance and distance covered in unit time.

Parameters:
  • mean (float or array_like of floats) – Distribution mean, must be > 0.

  • scale (float or array_like of floats) – Scale parameter, must be > 0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if mean and scale are both scalars. Otherwise, np.broadcast(mean, scale).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Wald distribution.

Return type:

ndarray or scalar

Notes

The probability density function for the Wald distribution is

\[P(x;mean,scale) = \sqrt{\frac{scale}{2\pi x^3}}e^ \frac{-scale(x-mean)^2}{2\cdotp mean^2x}\]

As noted above the inverse Gaussian distribution first arise from attempts to model Brownian motion. It is also a competitor to the Weibull for use in reliability modeling and modeling stock returns and interest rate processes.

References

Examples

Draw values from the distribution and plot the histogram:

>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> h = plt.hist(rng.wald(3, 2, 100000), bins=200, density=True)
>>> plt.show()
weibull(a, size=None)

Draw samples from a Weibull distribution.

Draw samples from a 1-parameter Weibull distribution with the given shape parameter a.

\[X = (-ln(U))^{1/a}\]

Here, U is drawn from the uniform distribution over (0,1].

The more common 2-parameter Weibull, including a scale parameter \(\lambda\) is just \(X = \lambda(-ln(U))^{1/a}\).

Parameters:
  • a (float or array_like of floats) – Shape parameter of the distribution. Must be nonnegative.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Weibull distribution.

Return type:

ndarray or scalar

See also

scipy.stats.weibull_max, scipy.stats.weibull_min, scipy.stats.genextreme, gumbel

Notes

The Weibull (or Type III asymptotic extreme value distribution for smallest values, SEV Type III, or Rosin-Rammler distribution) is one of a class of Generalized Extreme Value (GEV) distributions used in modeling extreme value problems. This class includes the Gumbel and Frechet distributions.

The probability density for the Weibull distribution is

\[p(x) = \frac{a} {\lambda}(\frac{x}{\lambda})^{a-1}e^{-(x/\lambda)^a},\]

where \(a\) is the shape and \(\lambda\) the scale.

The function has its peak (the mode) at \(\lambda(\frac{a-1}{a})^{1/a}\).

When a = 1, the Weibull distribution reduces to the exponential distribution.

References

Examples

Draw samples from the distribution:

>>> rng = np.random.default_rng()
>>> a = 5. # shape
>>> s = rng.weibull(a, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> def weibull(x, n, a):
...     return (a / n) * (x / n)**(a - 1) * np.exp(-(x / n)**a)
>>> count, bins, _ = plt.hist(rng.weibull(5., 1000))
>>> x = np.linspace(0, 2, 1000)
>>> bin_spacing = np.mean(np.diff(bins))
>>> plt.plot(x, weibull(x, 1., 5.) * bin_spacing * s.size, label='Weibull PDF')
>>> plt.legend()
>>> plt.show()
zipf(a, size=None)

Draw samples from a Zipf distribution.

Samples are drawn from a Zipf distribution with specified parameter a > 1.

The Zipf distribution (also known as the zeta distribution) is a discrete probability distribution that satisfies Zipf’s law: the frequency of an item is inversely proportional to its rank in a frequency table.

Parameters:
  • a (float or array_like of floats) – Distribution parameter. Must be greater than 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:

out – Drawn samples from the parameterized Zipf distribution.

Return type:

ndarray or scalar

See also

scipy.stats.zipf

probability density function, distribution, or cumulative density function, etc.

Notes

The probability mass function (PMF) for the Zipf distribution is

\[p(k) = \frac{k^{-a}}{\zeta(a)},\]

for integers \(k \geq 1\), where \(\zeta\) is the Riemann Zeta function.

It is named for the American linguist George Kingsley Zipf, who noted that the frequency of any word in a sample of a language is inversely proportional to its rank in the frequency table.

References

Examples

Draw samples from the distribution:

>>> a = 4.0
>>> n = 20000
>>> rng = np.random.default_rng()
>>> s = rng.zipf(a, size=n)

Display the histogram of the samples, along with the expected histogram based on the probability density function:

>>> import matplotlib.pyplot as plt
>>> from scipy.special import zeta

bincount provides a fast histogram for small integers.

>>> count = np.bincount(s)
>>> k = np.arange(1, s.max() + 1)
>>> plt.bar(k, count[1:], alpha=0.5, label='sample count')
>>> plt.plot(k, n*(k**-a)/zeta(a), 'k.-', alpha=0.5,
...          label='expected count')
>>> plt.semilogy()
>>> plt.grid(alpha=0.4)
>>> plt.legend()
>>> plt.title(f'Zipf sample, a={a}, size={n}')
>>> plt.show()
class westpa.core.we_driver.MT19937(seed=None)

Bases: BitGenerator

Container for the Mersenne Twister pseudo-random number generator.

Parameters:

seed ({None, int, array_like[ints], SeedSequence}, optional) – A seed to initialize the BitGenerator. If None, then fresh, unpredictable entropy will be pulled from the OS. If an int or array_like[ints] is passed, then it will be passed to SeedSequence to derive the initial BitGenerator state. One may also pass in a SeedSequence instance.

lock

Lock instance that is shared so that the same bit git generator can be used in multiple Generators without corrupting the state. Code that generates values from a bit generator should hold the bit generator’s lock.

Type:

threading.Lock

Notes

MT19937 provides a capsule containing function pointers that produce doubles, and unsigned 32 and 64- bit integers [1]_. These are not directly consumable in Python and must be consumed by a Generator or similar object that supports low-level access.

The Python stdlib module “random” also contains a Mersenne Twister pseudo-random number generator.

State and Seeding

The MT19937 state vector consists of a 624-element array of 32-bit unsigned integers plus a single integer value between 0 and 624 that indexes the current position within the main array.

The input seed is processed by SeedSequence to fill the whole state. The first element is reset such that only its most significant bit is set.

Parallel Features

The preferred way to use a BitGenerator in parallel applications is to use the SeedSequence.spawn method to obtain entropy values, and to use these to generate new BitGenerators:

>>> from numpy.random import Generator, MT19937, SeedSequence
>>> sg = SeedSequence(1234)
>>> rg = [Generator(MT19937(s)) for s in sg.spawn(10)]

Another method is to use MT19937.jumped which advances the state as-if \(2^{128}\) random numbers have been generated ([1]_, [2]_). This allows the original sequence to be split so that distinct segments can be used in each worker process. All generators should be chained to ensure that the segments come from the same sequence.

>>> from numpy.random import Generator, MT19937, SeedSequence
>>> sg = SeedSequence(1234)
>>> bit_generator = MT19937(sg)
>>> rg = []
>>> for _ in range(10):
...    rg.append(Generator(bit_generator))
...    # Chain the BitGenerators
...    bit_generator = bit_generator.jumped()

Compatibility Guarantee

MT19937 makes a guarantee that a fixed seed will always produce the same random integer stream.

References

jumped(jumps=1)

Returns a new bit generator with the state jumped

The state of the returned bit generator is jumped as-if 2**(128 * jumps) random numbers have been generated.

Parameters:

jumps (integer, positive) – Number of times to jump the state of the bit generator returned

Returns:

bit_generator – New instance of generator jumped iter times

Return type:

MT19937

Notes

The jump step is computed using a modified version of Matsumoto’s implementation of Horner’s method. The step polynomial is precomputed to perform 2**128 steps. The jumped state has been verified to match the state produced using Matsumoto’s original code.

References

state

Get or set the PRNG state

Returns:

state – Dictionary containing the information required to describe the state of the PRNG

Return type:

dict

class westpa.core.we_driver.Segment(n_iter=None, seg_id=None, weight=None, endpoint_type=None, parent_id=None, wtg_parent_ids=None, pcoord=None, status=None, walltime=None, cputime=None, data=None)

Bases: object

A class wrapping segment data that must be passed through the work manager or data manager. Most fields are self-explanatory. One item worth noting is that a negative parent ID means that the segment starts from the initial state with ID -(segment.parent_id+1)

SEG_ENDPOINT_CONTINUES = 1
SEG_ENDPOINT_MERGED = 2
SEG_ENDPOINT_RECYCLED = 3
SEG_ENDPOINT_UNSET = 0
SEG_INITPOINT_CONTINUES = 1
SEG_INITPOINT_NEWTRAJ = 2
SEG_INITPOINT_UNSET = 0
SEG_STATUS_COMPLETE = 2
SEG_STATUS_FAILED = 3
SEG_STATUS_PREPARED = 1
SEG_STATUS_UNSET = 0
endpoint_type_names = {0: 'SEG_ENDPOINT_UNSET', 1: 'SEG_ENDPOINT_CONTINUES', 2: 'SEG_ENDPOINT_MERGED', 3: 'SEG_ENDPOINT_RECYCLED'}
property endpoint_type_text
endpoint_types = {'SEG_ENDPOINT_CONTINUES': 1, 'SEG_ENDPOINT_MERGED': 2, 'SEG_ENDPOINT_RECYCLED': 3, 'SEG_ENDPOINT_UNSET': 0}
static final_pcoord(segment)

Return the final progress coordinate point of this segment.

static initial_pcoord(segment)

Return the initial progress coordinate point of this segment.

property initial_state_id
property initpoint_type
initpoint_type_names = {0: 'SEG_INITPOINT_UNSET', 1: 'SEG_INITPOINT_CONTINUES', 2: 'SEG_INITPOINT_NEWTRAJ'}
initpoint_types = {'SEG_INITPOINT_CONTINUES': 1, 'SEG_INITPOINT_NEWTRAJ': 2, 'SEG_INITPOINT_UNSET': 0}
status_names = {0: 'SEG_STATUS_UNSET', 1: 'SEG_STATUS_PREPARED', 2: 'SEG_STATUS_COMPLETE', 3: 'SEG_STATUS_FAILED'}
property status_text
statuses = {'SEG_STATUS_COMPLETE': 2, 'SEG_STATUS_FAILED': 3, 'SEG_STATUS_PREPARED': 1, 'SEG_STATUS_UNSET': 0}
class westpa.core.we_driver.InitialState(state_id, basis_state_id, iter_created, iter_used=None, istate_type=None, istate_status=None, pcoord=None, basis_state=None, basis_auxref=None)

Bases: object

Describes an initial state for a new trajectory. These are generally constructed by appropriate modification of a basis state.

Variables:
  • state_id – Integer identifier of this state, usually set by the data manager.

  • basis_state_id – Identifier of the basis state from which this state was generated, or None.

  • basis_state – The BasisState from which this state was generated, or None.

  • iter_created – Iteration in which this state was generated (0 for simulation initialization).

  • iter_used – Iteration in which this state was used to initiate a trajectory (None for unused).

  • istate_type – Integer describing the type of this initial state (ISTATE_TYPE_BASIS for direct use of a basis state, ISTATE_TYPE_GENERATED for a state generated from a basis state, ISTATE_TYPE_RESTART for a state corresponding to the endpoint of a segment in another simulation, or ISTATE_TYPE_START for a state generated from a start state).

  • istate_status – Integer describing whether this initial state has been properly prepared.

  • pcoord – The representative progress coordinate of this state.

ISTATE_STATUS_FAILED = 2
ISTATE_STATUS_PENDING = 0
ISTATE_STATUS_PREPARED = 1
ISTATE_TYPE_BASIS = 1
ISTATE_TYPE_GENERATED = 2
ISTATE_TYPE_RESTART = 3
ISTATE_TYPE_START = 4
ISTATE_TYPE_UNSET = 0
ISTATE_UNUSED = 0
as_numpy_record()
istate_status_names = {0: 'ISTATE_STATUS_PENDING', 1: 'ISTATE_STATUS_PREPARED', 2: 'ISTATE_STATUS_FAILED'}
istate_statuses = {'ISTATE_STATUS_FAILED': 2, 'ISTATE_STATUS_PENDING': 0, 'ISTATE_STATUS_PREPARED': 1}
istate_type_names = {0: 'ISTATE_TYPE_UNSET', 1: 'ISTATE_TYPE_BASIS', 2: 'ISTATE_TYPE_GENERATED', 3: 'ISTATE_TYPE_RESTART', 4: 'ISTATE_TYPE_START'}
istate_types = {'ISTATE_TYPE_BASIS': 1, 'ISTATE_TYPE_GENERATED': 2, 'ISTATE_TYPE_RESTART': 3, 'ISTATE_TYPE_START': 4, 'ISTATE_TYPE_UNSET': 0}
exception westpa.core.we_driver.ConsistencyError

Bases: RuntimeError

exception westpa.core.we_driver.AccuracyError

Bases: RuntimeError

class westpa.core.we_driver.NewWeightEntry(source_type, weight, prev_seg_id=None, prev_init_pcoord=None, prev_final_pcoord=None, new_init_pcoord=None, target_state_id=None, initial_state_id=None)

Bases: object

NW_SOURCE_RECYCLED = 0
class westpa.core.we_driver.WEDriver(rc=None, system=None)

Bases: object

A class implemented Huber & Kim’s weighted ensemble algorithm over Segment objects. This class handles all binning, recycling, and preparation of new Segment objects for the next iteration. Binning is accomplished using system.bin_mapper, and per-bin target counts are from system.bin_target_counts.

The workflow is as follows:

  1. Call new_iteration() every new iteration, providing any recycling targets that are in force and any available initial states for recycling.

  2. Call assign() to assign segments to bins based on their initial and end points. This returns the number of walkers that were recycled.

  3. Call run_we(), optionally providing a set of initial states that will be used to recycle walkers.

Note the presence of flux_matrix, transition_matrix, current_iter_segments, next_iter_segments, recycling_segments, initial_binning, final_binning, next_iter_binning, and new_weights (to be documented soon).

weight_split_threshold = 2.0
weight_merge_cutoff = 1.0
largest_allowed_weight = 1.0
smallest_allowed_weight = 1e-310
process_config()
property next_iter_segments

Newly-created segments for the next iteration

property current_iter_segments

Segments for the current iteration

property next_iter_assignments

Bin assignments (indices) for initial points of next iteration.

property current_iter_assignments

Bin assignments (indices) for endpoints of current iteration.

property recycling_segments

Segments designated for recycling

property n_recycled_segs

Number of segments recycled this iteration

property n_istates_needed

Number of initial states needed to support recycling for this iteration

check_threshold_configs()

Check to see if weight thresholds parameters are valid

clear()

Explicitly delete all Segment-related state.

new_iteration(initial_states=None, target_states=None, new_weights=None, bin_mapper=None, bin_target_counts=None)

Prepare for a new iteration. initial_states is a sequence of all InitialState objects valid for use in to generating new segments for the next iteration (after the one being begun with the call to new_iteration); that is, these are states available to recycle to. Target states which generate recycling events are specified in target_states, a sequence of TargetState objects. Both initial_states and target_states may be empty as required.

The optional new_weights is a sequence of NewWeightEntry objects which will be used to construct the initial flux matrix.

The given bin_mapper will be used for assignment, and bin_target_counts used for splitting/merging target counts; each will be obtained from the system object if omitted or None.

add_initial_states(initial_states)

Add newly-prepared initial states to the pool available for recycling.

property all_initial_states

Return an iterator over all initial states (available or used)

assign(segments, initializing=False)

Assign segments to initial and final bins, and update the (internal) lists of used and available initial states. If initializing is True, then the “final” bin assignments will be identical to the initial bin assignments, a condition required for seeding a new iteration from pre-existing segments.

populate_initial(initial_states, weights, system=None)

Create walkers for a new weighted ensemble simulation.

One segment is created for each provided initial state, then binned and split/merged as necessary. After this function is called, next_iter_segments will yield the new segments to create, used_initial_states will contain data about which of the provided initial states were used, and avail_initial_states will contain data about which initial states were unused (because their corresponding walkers were merged out of existence).

rebin_current(parent_segments)

Reconstruct walkers for the current iteration based on (presumably) new binning. The previous iteration’s segments must be provided (as parent_segments) in order to update endpoint types appropriately.

construct_next()

Construct walkers for the next iteration, by running weighted ensemble recycling and bin/split/merge on the segments previously assigned to bins using assign. Enough unused initial states must be present in self.avail_initial_states for every recycled walker to be assigned an initial state.

After this function completes, self.flux_matrix contains a valid flux matrix for this iteration (including any contributions from recycling from the previous iteration), and self.next_iter_segments contains a list of segments ready for the next iteration, with appropriate values set for weight, endpoint type, parent walkers, and so on.

westpa.core.wm_ops module

westpa.core.wm_ops.get_pcoord(state)
westpa.core.wm_ops.gen_istate(basis_state, initial_state)
westpa.core.wm_ops.prep_iter(n_iter, segments)
westpa.core.wm_ops.post_iter(n_iter, segments)
westpa.core.wm_ops.propagate(basis_states, initial_states, segments)

westpa.core.yamlcfg module

YAML-based configuration files for WESTPA

westpa.core.yamlcfg.YLoader

alias of CLoader

class westpa.core.yamlcfg.NopMapper

Bases: BinMapper

Put everything into one bin.

assign(coords, mask=None, output=None)
exception westpa.core.yamlcfg.ConfigValueWarning

Bases: UserWarning

westpa.core.yamlcfg.warn_dubious_config_entry(entry, value, expected_type=None, category=<class 'westpa.core.yamlcfg.ConfigValueWarning'>, stacklevel=1)
westpa.core.yamlcfg.check_bool(value, action='warn')

Check that the given value is boolean in type. If not, either raise a warning (if action=='warn') or an exception (action=='raise').

exception westpa.core.yamlcfg.ConfigItemMissing(key, message=None)

Bases: KeyError

exception westpa.core.yamlcfg.ConfigItemTypeError(key, expected_type, message=None)

Bases: TypeError

exception westpa.core.yamlcfg.ConfigValueError(key, value, message=None)

Bases: ValueError

class westpa.core.yamlcfg.YAMLConfig

Bases: object

preload_config_files = ['/etc/westpa/westrc', '/home/docs/.westrc']
update_from_file(file, required=True)
require(key, type_=None)

Ensure that a configuration item with the given key is present. If the optional type_ is given, additionally require that the item has that type.

require_type_if_present(key, type_)

Ensure that the configuration item with the given key has the given type.

coerce_type_if_present(key, type_)
get(key, default=None)
get_typed(key, type_, default=<object object>)
get_path(key, default=<object object>, expandvars=True, expanduser=True, realpath=True, abspath=True)
get_pathlist(key, default=<object object>, sep=':', expandvars=True, expanduser=True, realpath=True, abspath=True)
get_python_object(key, default=<object object>, path=None)
get_choice(key, choices, default=<object object>, value_transform=None)
class westpa.core.yamlcfg.YAMLSystem(rc=None)

Bases: object

A description of the system being simulated, including the dimensionality and data type of the progress coordinate, the number of progress coordinate entries expected from each segment, and binning. To construct a simulation, the user must subclass WESTSystem and set several instance variables.

At a minimum, the user must subclass WESTSystem and override :method:`initialize` to set the data type and dimensionality of progress coordinate data and define a bin mapper.

Variables:
  • pcoord_ndim – The number of dimensions in the progress coordinate. Defaults to 1 (i.e. a one-dimensional progress coordinate).

  • pcoord_dtype – The data type of the progress coordinate, which must be callable (e.g. np.float32 and long will work, but '<f4' and '<i8' will not). Defaults to np.float64.

  • pcoord_len – The length of the progress coordinate time series generated by each segment, including both the initial and final values. Defaults to 2 (i.e. only the initial and final progress coordinate values for a segment are returned from propagation).

  • bin_mapper – A bin mapper describing the progress coordinate space.

  • bin_target_counts – A vector of target counts, one per bin.

property bin_target_counts
initialize()

Prepare this system object for use in simulation or analysis, creating a bin space, setting replicas per bin, and so on. This function is called whenever a WEST tool creates an instance of the system driver.

prepare_run()

Prepare this system for use in a simulation run. Called by w_run in all worker processes.

finalize_run()

A hook for system-specific processing for the end of a simulation run (as defined by such things as maximum wallclock time, rather than perhaps more scientifically-significant definitions of “the end of a simulation run”)

new_pcoord_array(pcoord_len=None)

Return an appropriately-sized and -typed pcoord array for a timepoint, segment, or number of segments. If pcoord_len is not specified (or None), then a length appropriate for a segment is returned.

new_region_set()