westpa.mclib package
Module contents
A package for performing Monte Carlo bootstrap estimates of statistics.
- westpa.mclib.mcbs_correltime(dataset, alpha, n_sets=None)
Calculate the correlation time of the given
dataset
, significant to the (1-alpha
) level, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8. An appropriate balance between space and speed is chosen based on the size of the input data.Returns 0 for data statistically uncorrelated with (1-alpha) confidence, otherwise the correlation length. (Thus, the appropriate stride for blocking is the result of this function plus one.)
- westpa.mclib.get_bssize(alpha)
Return a bootstrap data set size appropriate for the given confidence level.
- westpa.mclib.mcbs_ci(dataset, estimator, alpha, dlen, n_sets=None, args=None, kwargs=None, sort=<function msort>)
Perform a Monte Carlo bootstrap estimate for the (1-
alpha
) confidence interval on the givendataset
with the givenestimator
. This routine is not appropriate for time-correlated data.Returns
(estimate, ci_lb, ci_ub)
whereestimate
is the application of the givenestimator
to the inputdataset
, andci_lb
andci_ub
are the lower and upper limits, respectively, of the (1-alpha
) confidence interval onestimate
.estimator
is called asestimator(dataset, *args, **kwargs)
. Common estimators include:numpy.mean – calculate the confidence interval on the mean of
dataset
numpy.median – calculate a confidence interval on the median of
dataset
numpy.std – calculate a confidence interval on the standard deviation of
datset
.
n_sets
is the number of synthetic data sets to generate using the givenestimator
, which will be chosen using `get_bssize()`_ ifn_sets
is not given.sort
can be used to override the sorting routine used to calculate the confidence interval, which should only be necessary for estimators returning vectors rather than scalars.
- westpa.mclib.mcbs_ci_correl(estimator_datasets, estimator, alpha, n_sets=None, args=None, autocorrel_alpha=None, autocorrel_n_sets=None, subsample=None, do_correl=True, mcbs_enable=None, estimator_kwargs={})
Perform a Monte Carlo bootstrap estimate for the (1-
alpha
) confidence interval on the givendataset
with the givenestimator
. This routine is appropriate for time-correlated data, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8 to determine a statistically-significant correlation time and then reducing the dataset by a factor of that correlation time before running a “classic” Monte Carlo bootstrap.Returns
(estimate, ci_lb, ci_ub, correl_time)
whereestimate
is the application of the givenestimator
to the inputdataset
,ci_lb
andci_ub
are the lower and upper limits, respectively, of the (1-alpha
) confidence interval onestimate
, andcorrel_time
is the correlation time of the dataset, significant to (1-autocorrel_alpha
).estimator
is called asestimator(dataset, *args, **kwargs)
. Common estimators include:np.mean – calculate the confidence interval on the mean of
dataset
np.median – calculate a confidence interval on the median of
dataset
np.std – calculate a confidence interval on the standard deviation of
datset
.
n_sets
is the number of synthetic data sets to generate using the givenestimator
, which will be chosen using `get_bssize()`_ ifn_sets
is not given.autocorrel_alpha
(which defaults toalpha
) can be used to adjust the significance level of the autocorrelation calculation. Note that too high a significance level (too low an alpha) for evaluating the significance of autocorrelation values can result in a failure to detect correlation if the autocorrelation function is noisy.The given
subsample
function is used, if provided, to subsample the dataset prior to running the full Monte Carlo bootstrap. If none is provided, then a random entry from each correlated block is used as the value for that block. Other reasonable choices includenp.mean
,np.median
,(lambda x: x[0])
or(lambda x: x[-1])
. In particular, usingsubsample=np.mean
will converge to the block averaged mean and standard error, while accounting for any non-normality in the distribution of the mean.