westpa.mclib package

Module contents

A package for performing Monte Carlo bootstrap estimates of statistics.

westpa.mclib.mcbs_correltime(dataset, alpha, n_sets=None)

Calculate the correlation time of the given dataset, significant to the (1-alpha) level, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8. An appropriate balance between space and speed is chosen based on the size of the input data.

Returns 0 for data statistically uncorrelated with (1-alpha) confidence, otherwise the correlation length. (Thus, the appropriate stride for blocking is the result of this function plus one.)

westpa.mclib.get_bssize(alpha): Return a bootstrap data set size appropriate for the given confidence level.

westpa.mclib.mcbs_ci(dataset, estimator, alpha, dlen, n_sets=None, args=None, kwargs=None, sort=<function msort>)

Perform a Monte Carlo bootstrap estimate for the (1-alpha) confidence interval on the given dataset with the given estimator. This routine is not appropriate for time-correlated data.

Returns (estimate, ci_lb, ci_ub) where estimate is the application of the given estimator to the input dataset, and ci_lb and ci_ub are the lower and upper limits, respectively, of the (1-alpha) confidence interval on estimate.

estimator is called as estimator(dataset, *args, **kwargs). Common estimators include:

numpy.mean – calculate the confidence interval on the mean of dataset
numpy.median – calculate a confidence interval on the median of dataset
numpy.std – calculate a confidence interval on the standard deviation of datset.

n_sets is the number of synthetic data sets to generate using the given estimator, which will be chosen using `get_bssize()`_ if n_sets is not given.

sort can be used to override the sorting routine used to calculate the confidence interval, which should only be necessary for estimators returning vectors rather than scalars.

westpa.mclib.mcbs_ci_correl(estimator_datasets, estimator, alpha, n_sets=None, args=None, autocorrel_alpha=None, autocorrel_n_sets=None, subsample=None, do_correl=True, mcbs_enable=None, estimator_kwargs={})

Perform a Monte Carlo bootstrap estimate for the (1-alpha) confidence interval on the given dataset with the given estimator. This routine is appropriate for time-correlated data, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8 to determine a statistically-significant correlation time and then reducing the dataset by a factor of that correlation time before running a “classic” Monte Carlo bootstrap.

Returns (estimate, ci_lb, ci_ub, correl_time) where estimate is the application of the given estimator to the input dataset, ci_lb and ci_ub are the lower and upper limits, respectively, of the (1-alpha) confidence interval on estimate, and correl_time is the correlation time of the dataset, significant to (1-autocorrel_alpha).

estimator is called as estimator(dataset, *args, **kwargs). Common estimators include:

np.mean – calculate the confidence interval on the mean of dataset
np.median – calculate a confidence interval on the median of dataset
np.std – calculate a confidence interval on the standard deviation of datset.

n_sets is the number of synthetic data sets to generate using the given estimator, which will be chosen using `get_bssize()`_ if n_sets is not given.

autocorrel_alpha (which defaults to alpha) can be used to adjust the significance level of the autocorrelation calculation. Note that too high a significance level (too low an alpha) for evaluating the significance of autocorrelation values can result in a failure to detect correlation if the autocorrelation function is noisy.

The given subsample function is used, if provided, to subsample the dataset prior to running the full Monte Carlo bootstrap. If none is provided, then a random entry from each correlated block is used as the value for that block. Other reasonable choices include np.mean, np.median, (lambda x: x[0]) or (lambda x: x[-1]). In particular, using subsample=np.mean will converge to the block averaged mean and standard error, while accounting for any non-normality in the distribution of the mean.