class BayesianMSM

class deeptime.markov.msm.BayesianMSM(n_samples: int = 100, n_steps: Optional[int] = None, reversible: bool = True, stationary_distribution_constraint: Optional[ndarray] = None, sparse: bool = False, maxiter: int = 1000000, maxerr: float = 1e-08, lagtime: Optional[int] = None)

Bayesian estimator for MSMs given discrete trajectory statistics.

Implementation following [1].

Parameters:
  • n_samples (int, optional, default=100) – Number of sampled transition matrices used in estimation of confidences.

  • n_steps (int, optional, default=None) – Number of Gibbs sampling steps for each transition matrix. If None, nsteps will be determined automatically as the square root of the number of states in the full state space of the count matrix. This is a heuristic for the number of steps it takes to decorrelate between samples.

  • reversible (bool, optional, default=True) – If true compute reversible MSM, else non-reversible MSM.

  • stationary_distribution_constraint (ndarray, optional, default=None) – Stationary vector on the full set of states. Assign zero stationary probabilities to states for which the stationary vector is unknown. Estimation will be made such that the resulting ensemble of transition matrices is defined on the intersection of the states with positive stationary vector and the largest connected set (undirected in the default case).

  • sparse (bool, optional, default=False) – If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra. In this case python sparse matrices will be returned by the corresponding functions instead of numpy arrays. This behavior is suggested for very large numbers of states (e.g. > 4000) because it is likely to be much more efficient.

  • maxiter (int, optional, default=1000000) – Optional parameter with reversible = True, sets the maximum number of iterations before the transition matrix estimation method exits.

  • maxerr (float, optional, default=1e-8) – Optional parameter with reversible = True. Convergence tolerance for transition matrix estimation. This specifies the maximum change of the Euclidean norm of relative stationary probabilities (\(x_i = \sum_k x_{ik}\)). The relative stationary probability changes \(e_i = (x_i^{(1)} - x_i^{(2)})/(x_i^{(1)} + x_i^{(2)})\) are used in order to track changes in small probabilities. The Euclidean norm of the change vector, \(|e_i|_2\), is compared to maxerr.

  • lagtime (int, optional, default=None) – The lagtime that is used when fitting directly from discrete trajectories.

References

Examples

Note that the following example is only qualitatively and not quantitatively reproducible because it involves random numbers.

We build a Bayesian Markov model for the following two trajectories at lag time \(\tau=2\):

>>> import numpy as np
>>> import deeptime
>>> dtrajs = [np.array([0,1,2,2,2,2,1,2,2,2,1,0,0,0,0,0,0,0]), np.array([0,0,0,0,1,1,2,2,2,2,2,2,2,1,0,0])]
>>> counts = deeptime.markov.TransitionCountEstimator(lagtime=2, count_mode="effective").fit(dtrajs).fetch_model()
>>> mm = deeptime.markov.msm.BayesianMSM().fit(counts).fetch_model()

The resulting Model contains a prior MSM as well as a list of sample MSMs. Its transition matrix comes from a maximum likelihood estimation. We can access, e.g., the transition matrix as follows:

>>> print(mm.prior.transition_matrix)  
[[ 0.70000001  0.16463699  0.135363  ]
 [ 0.38169055  0.          0.61830945]
 [ 0.12023989  0.23690297  0.64285714]]

Furthermore, the bayesian MSM posterior returned by fetch_model() is able to compute the probability distribution and statistical models of all methods that are offered by the MSM object. This works as follows. The BayesianMSMPosterior.gather_stats() method takes as argument the method you want to evaluate and then returns a statistics summary over requested quantity:

>>> print(mm.gather_stats('transition_matrix').mean)  
[[ 0.71108663  0.15947371  0.12943966]
 [ 0.41076105  0.          0.58923895]
 [ 0.13079372  0.23005443  0.63915185]]

Likewise, the standard deviation by element:

>>> print(mm.gather_stats('transition_matrix').std)  
[[ 0.13707029  0.09479627  0.09200214]
 [ 0.15247454  0.          0.15247454]
 [ 0.07701315  0.09385258  0.1119089 ]]

And this is the 95% (2 sigma) confidence interval. You can control the percentile using the conf argument in this function:

>>> stats = mm.gather_stats('transition_matrix')
>>> print(stats.L) 
>>> print(stats.R)  
[[ 0.44083423  0.03926518  0.0242113 ]
 [ 0.14102544  0.          0.30729828]
 [ 0.02440188  0.07629456  0.43682481]]
[[ 0.93571706  0.37522581  0.40180041]
 [ 0.69307665  0.          0.8649215 ]
 [ 0.31029752  0.44035732  0.85994006]]

If you want to compute expectations of functions that require arguments, just pass these arguments as well:

>>> print(mm.gather_stats('mfpt', A=0, B=2)) 
12.9049811296

And if you want to histogram the distribution or compute more complex statistical moment such as the covariance between different quantities, just get the full sample of your quantity of interest and evaluate it at will:

>>> samples = mm.gather_stats('mfpt', store_samples=True, A=0, B=2).samples
>>> print(samples[:4]) 
[7.9763615793248155, 8.6540958274695701, 26.295326015231058, 17.909895469938899]

Internally, the SampledMSM object has 100 transition matrices (the number can be controlled by nsamples), that were computed by the transition matrix sampling method. All of the above sample functions iterate over these 100 transition matrices and evaluate the requested function with the given parameters on each of them.

Attributes

has_model

Property reporting whether this estimator contains an estimated model.

model

Shortcut to fetch_model().

reversible

If true compute reversible MarkovStateModel, else non-reversible MarkovStateModel

sparse

If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra.

stationary_distribution_constraint

The stationary distribution constraint that can either be None (no constraint) or constrains the count and transition matrices to states with positive stationary vector entries.

Methods

fetch_model()

Yields the model that was estimated the most recent.

fit(data[, callback])

Performs the estimation on either a count matrix or a previously estimated TransitionCountModel.

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

fit_from_counts(counts[, callback])

Fits a bayesian MSM on a count model or a count matrix.

fit_from_discrete_timeseries(discrete_timeseries)

Fits a BayesianMSM directly on timeseries data.

fit_from_msm(msm[, callback])

Fits a bayesian posterior from a given Markov state model.

get_params([deep])

Get the parameters.

sample(prior, n_samples[, n_steps, callback])

Performs sampling based on a prior.

set_params(**params)

Set the parameters of this estimator.

fetch_model() Optional[BayesianMSMPosterior]

Yields the model that was estimated the most recent.

Returns:

model – The estimated model or None if fit was not called.

Return type:

BayesianMSMPosterior or None

fit(data, callback: Optional[Callable] = None, **kw)

Performs the estimation on either a count matrix or a previously estimated TransitionCountModel.

Parameters:
  • data ((N,N) count matrix or TransitionCountModel or MaximumLikelihoodMSM or MarkovStateModel) – a count matrix or a transition count model that was estimated from data

  • callback (callable, optional, default=None) – Function to be called to indicate progress of sampling.

  • ignore_counting_mode (bool, optional, default=False) – Method does not raise if counting mode isn’t of the “effective” family. Use with caution.

Returns:

self – Reference to self.

Return type:

BayesianMSM

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

Parameters:
  • data (array_like) – Data that is used to fit the model.

  • **kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

fit_from_counts(counts, callback=None, **kw)

Fits a bayesian MSM on a count model or a count matrix.

Parameters:
  • counts (TransitionCountModel or (n, n) ndarray) – The transition counts.

  • callback (callable, optional, default=None) – Function that is called to indicate progress of sampling.

  • **kw – Optional keyword parameters.

Returns:

self – Reference to self.

Return type:

BayesianMSM

fit_from_discrete_timeseries(discrete_timeseries: Union[ndarray, List[ndarray]], lagtime: Optional[int] = None, count_mode: str = 'effective', callback=None, **kw)

Fits a BayesianMSM directly on timeseries data.

Parameters:
  • discrete_timeseries (list of ndarray) – Discrete trajectories.

  • lagtime (int, optional, default=None) – The lagtime that is used for estimation. If None, uses the instance’s lagtime attribute.

  • count_mode (str, default='effective') – The counting mode. Should be of the effective kind, otherwise the results may be heavily biased.

  • callback (callable, optional, default=None) – Function to be called to indicate progress of sampling.

  • **kw – Optional keyword parameters.

Returns:

self – Reference to self.

Return type:

BayesianMSM

fit_from_msm(msm: MarkovStateModel, callback=None, **kw)

Fits a bayesian posterior from a given Markov state model. The MSM must contain a count model to be able to produce confidences. Note that the count model should be produced using effective counting, otherwise counts are correlated and computed confidences are wrong.

Parameters:
  • msm (MarkovStateModel) – The Markov state model to use as sampling start point.

  • callback (callable, optional, default=None) – Function to be called to indicate progress of sampling.

  • ignore_counting_mode (bool, optional, default=False) – Method does not raise if counting mode isn’t of the “effective” family. Use with caution.

Returns:

self – Reference to self.

Return type:

BayesianMSM

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

sample(prior: MarkovStateModel, n_samples: int, n_steps: Optional[int] = None, callback=None)

Performs sampling based on a prior.

Parameters:
  • prior (MarkovStateModel) – The MSM that is used as initial sampling point.

  • n_samples (int) – The number of samples to draw.

  • n_steps (int, optional, default=None) – The number of sampling steps for each transition matrix. If None, determined by \(\sqrt{\mathrm{n\_states}}\).

  • callback (callable, optional, default=None) – Callback function that indicates progress of sampling.

Returns:

samples – The generated samples

Return type:

list of MarkovStateModel

Examples

This method can in particular be used to append samples to an already estimated posterior:

>>> import numpy as np
>>> import deeptime as dt
>>> dtrajs = [np.array([0,1,2,2,2,2,1,2,2,2,1,0,0,0,0,0,0,0]),
...           np.array([0,0,0,0,1,1,2,2,2,2,2,2,2,1,0,0])]
>>> estimator = dt.markov.msm.BayesianMSM(lagtime=1)
>>> posterior = estimator.fit_fetch(dtrajs)
>>> n_samples = len(posterior.samples)
>>> posterior.samples.extend(estimator.sample(posterior.prior, n_samples=23))
>>> assert len(posterior.samples) == n_samples + 23
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

property has_model: bool

Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.

Type:

bool

property model

Shortcut to fetch_model().

property reversible: bool

If true compute reversible MarkovStateModel, else non-reversible MarkovStateModel

property sparse: bool

If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra. In this case python sparse matrices will be returned by the corresponding functions instead of numpy arrays. This behavior is suggested for very large numbers of states (e.g. > 4000) because it is likely to be much more efficient.

property stationary_distribution_constraint: Optional[ndarray]

The stationary distribution constraint that can either be None (no constraint) or constrains the count and transition matrices to states with positive stationary vector entries.

Getter:

Retrieves the currently configured constraint, can be None.

Setter:

Sets a stationary distribution constraint by giving a stationary vector as value. The estimated count- and transition-matrices are restricted to states that have positive entries. In case the vector is not normalized, setting it here implicitly copies and normalizes it.

Type:

ndarray or None