Markov state models

Here we introduce several types of markov state models as well as analysis tools related to them. At the very core, markov state models are a stochastic model describing chains of events where the state of one particular point in time only depends on the state prior to it, i.e., considering the chain of events \((\ldots,X_{t-2}, X_{t-1}, X_t)\) with a set of possible states \(S\), the probability of encountering a particular state \(X_{t+1}\in S\) is a conditional probability on \(X_t\in S\).

A great deal is written about MSMs in the literature, so we omit many crucial discussions here. The 2018 review by Husic and Pande [1] is a good place to start for a high-level discussion of Markov state models and a chronology of their development in the context of molecular kinetics. Figure 3 is particularly helpful for understanding the many “flavors” of MSM analyses developed. A comprehensive overview of the mathematics was presented by Prinz et al [2], including the MLE estimator used in Maximum Likelihood MSMs. This content is also covered in Chapter 4 of a useful book on Markov state models [3], which is a valuable resource for many aspects of Markov state modeling (see book Figure 1.1).

The standard formulation - which is also employed here - assumes that \(S\) is discrete and of finite cardinality. This means that when related back to continuous-space processes, these discrete states represent a Voronoi tessellation of state space and can be obtained via indicator functions.

These conditional probabilities are often described as so-called transition matrix \(P\in \mathbb{R}^{n\times n}\), where \(n = |S|\), the number of states. Assuming that \(S\) is represented by the enumeration \(\{1,\ldots,n\}\), it is

\[P_{ij} = \mathbb{P}(X_{t+1}=j \mid X_t = i)\quad\forall t, \]

i.e., the probability of transitioning to state \(i\) given one is currently in state \(j\). This also means that \(P\) is a row-stochastic matrix

\[\sum_{j=1}^n P_{ij} = 1 \quad\forall i=1,\ldots,n. \]

If a markov state model is available, interesting dynamical quantities can be computed, e.g., mean first passage times and fluxes between (sets of) states [4], timescales, metastable decompositions of markov states [5].

The goal of the deeptime.markov package is to provide tools to estimate and analyze markov state models from discrete-state timeseries data. If the data’s domain is not discrete, clustering can be employed to assign each frame to a state.

In the following, we introduce the core object, the MarkovStateModel, as well as a variety of estimators.

When estimating a MSM from time series data, it is important to collect statistics over the encountered state transitions. This is covered in transition counting.

Furthermore, deeptime implements Augmented Markov models [6] which can be used when experimental data is available, as well as Observable Operator Model MSMs [7] which is an unbiased estimator for the MSM transition matrix that corrects for the effect of starting out of equilibrium, even when short lag times are used.

Multiensemble MSMs

Deeptime offers the TRAM method [8] for estimating multiensemble MSMs. These are collections of MSMs based on simulations that are governed by biased dynamics (i.e., replica exchange simulations and umbrella sampling).

References