deeptime.markov.tools.estimation.bootstrap_trajectories¶

deeptime.markov.tools.estimation.bootstrap_trajectories(trajs, correlation_length)¶

Generates a randomly resampled trajectory segments.

Parameters:

trajs (array-like or array-like of array-like) – single or multiple trajectories. Every trajectory is assumed to be a statistically independent realization. Note that this is often not true and is a weakness with the present bootstrapping approach.
correlation_length (int) – Correlation length (also known as the or statistical inefficiency) of the data. If set to < 1 or > L, where L is the longest trajectory length, the bootstrapping will sample full trajectories. We suggest to select the largest implied timescale or relaxation timescale as a conservative estimate of the correlation length. If this timescale is unknown, it’s suggested to use full trajectories (set timescale to < 1) or come up with a rough estimate. For computing the error on specific observables, one may use shorter timescales, because the relevant correlation length is the integral of the autocorrelation function of the observables of interest [3]. The slowest implied timescale is an upper bound for that correlation length, and therefore a conservative estimate [4].

Notes

This function can be called multiple times in order to generate randomly resampled trajectory data. In order to compute error bars on your observable of interest, call this function to generate resampled trajectories, and put them into your estimator. The standard deviation of such a sample of the observable is a model for the standard error.

Implements a moving block bootstrapping procedure [1] for generation of randomly resampled count matrixes from discrete trajectories. The corrlation length determines the size of trajectory blocks that will remain contiguous. For a single trajectory N with correlation length t_corr < N, we will sample floor(N/t_corr) subtrajectories of length t_corr using starting time t. t is a uniform random number in [0, N-t_corr-1]. When multiple trajectories are available, N is the total number of timesteps over all trajectories, the algorithm will generate resampled data with a total number of N (or slightly larger) time steps. Each trajectory of length n_i has a probability of n_i to be selected. Trajectories of length n_i <= t_corr are returned completely. For longer trajectories, segments of length t_corr are randomly generated.

Note that like all error models for correlated time series data, Bootstrapping just gives you a model for the error given a number of assumptions [2]. The most critical decisions are: (1) is this approach meaningful at all (only if the trajectories are statistically independent realizations), and (2) select an appropriate timescale of the correlation length (see below). Note that transition matrix sampling from the Dirichlet distribution is a much better option from a theoretical point of view, but may also be computationally more demanding.

References