class MaximumLikelihoodMSM

class deeptime.markov.msm.MaximumLikelihoodMSM(reversible: bool = True, stationary_distribution_constraint: Optional[ndarray] = None, sparse: bool = False, allow_disconnected: bool = False, maxiter: int = 1000000, maxerr: float = 1e-08, connectivity_threshold: float = 0, transition_matrix_tolerance: float = 1e-06, lagtime=None, use_lcc: bool = False)

Maximum likelihood estimator for MSMs (MarkovStateModel) given discrete trajectories or statistics thereof. This estimator produces instances of MSMs in form of MSM collections (MarkovStateModelCollection) which contain as many MSMs as there are connected sets in the counting. A collection of MSMs per default behaves exactly like an ordinary MSM model on the largest connected set. The connected set can be switched, changing the state of the collection to be have like an MSM on the selected state subset.

Implementation according to [1].

Parameters:
  • reversible (bool, optional, default=True) – If true compute reversible MarkovStateModel, else non-reversible MarkovStateModel

  • stationary_distribution_constraint ((N,) ndarray, optional, default=None) – Stationary vector on the full set of states. Estimation will be made such the the resulting transition matrix has this distribution as an equilibrium distribution. Set probabilities to zero if the states which should be excluded from the analysis.

  • sparse (bool, optional, default=False) – If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra. In this case python sparse matrices will be returned by the corresponding functions instead of numpy arrays. This behavior is suggested for very large numbers of states (e.g. > 4000) because it is likely to be much more efficient.

  • allow_disconnected (bool, optional, default=False) – If set to true, the resulting transition matrix may have disconnected and transient states, and the estimated stationary distribution is only meaningful on the respective connected sets.

  • maxiter (int, optional, default=1000000) – Optional parameter with reversible = True, sets the maximum number of iterations before the transition matrix estimation method exits.

  • maxerr (float, optional, default = 1e-8) – Optional parameter with reversible = True. Convergence tolerance for transition matrix estimation. This specifies the maximum change of the Euclidean norm of relative stationary probabilities (\(x_i = \sum_k x_{ik}\)). The relative stationary probability changes \(e_i = (x_i^{(1)} - x_i^{(2)})/(x_i^{(1)} + x_i^{(2)})\) are used in order to track changes in small probabilities. The Euclidean norm of the change vector, \(|e_i|_2\), is compared to maxerr.

  • transition_matrix_tolerance (float, default=1e-8) – The tolerance under which a matrix is still considered a transition matrix (only non-negative elements and row sums of 1).

  • connectivity_threshold (float, optional, default=0.) – Number of counts required to consider two states connected.

  • lagtime (int, optional, default=None) – Optional lagtime that can be provided at estimator level if fitting from timeseries directly.

  • use_lcc (bool, default=False) – If set to true, this will restrict the resulting MSM collection to only contain the largest connected state-space component.

References

Attributes

allow_disconnected

If set to true, the resulting transition matrix may have disconnected and transient states.

has_model

Property reporting whether this estimator contains an estimated model.

model

Shortcut to fetch_model().

reversible

If true compute reversible MarkovStateModel, else non-reversible MarkovStateModel

sparse

If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra.

stationary_distribution_constraint

The stationary distribution constraint that can either be None (no constraint) or constrains the count and transition matrices to states with positive stationary vector entries.

Methods

fetch_model()

Yields the most recent MarkovStateModelCollection that was estimated.

fit(data, *args, **kw)

Fits a new markov state model according to data.

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

fit_from_counts(counts)

Fits a model from counts in form of a (n, n) count matrix, a TransitionCountModel or an instance of TransitionCountEstimator, which has been fit on data previously.

fit_from_discrete_timeseries(...[, count_mode])

Fits a model directly from discrete time series data.

get_params([deep])

Get the parameters.

set_params(**params)

Set the parameters of this estimator.

fetch_model() Optional[MarkovStateModelCollection]

Yields the most recent MarkovStateModelCollection that was estimated. Can be None if fit was not called.

Returns:

model – The most recent markov state model or None.

Return type:

MarkovStateModelCollection or None

fit(data, *args, **kw)

Fits a new markov state model according to data.

Parameters:
  • data (TransitionCountModel or (n, n) ndarray or discrete timeseries) –

    Input data, can either be TransitionCountModel or a 2-dimensional ndarray which is interpreted as count matrix or a discrete timeseries (or a list thereof) directly.

    In the case of a timeseries, a lagtime must be provided in the keyword arguments. In this case, also the keyword argument “count_mode” can be used, which defaults to “sliding”. See also fit_from_discrete_timeseries().

  • *args – Dummy parameters for scikit-learn compatibility.

  • **kw – Parameters for scikit-learn compatibility and optionally lagtime if fitting with time series data.

Returns:

self – Reference to self.

Return type:

MaximumLikelihoodMSM

See also

TransitionCountModel

Transition count model

TransitionCountEstimator

Estimating transition count models from data

Examples

This example is demonstrating how to fit a Markov state model collection from data which decomposes into a collection of two sets of states with corresponding transition matrices.

>>> from deeptime.markov.msm import MarkovStateModel  # import MSM
>>> msm1 = MarkovStateModel([[.7, .3], [.3, .7]])  # create first MSM
>>> msm2 = MarkovStateModel([[.9, .05, .05], [.3, .6, .1], [.1, .1, .8]])  # create second MSM

Now, simulate a trajectory where the states of msm2 are shifted by a fixed number 2, i.e., msm1 describes states [0, 1] and msm2 describes states [2, 3, 4] in the generated trajectory.

>>> traj = np.concatenate([msm1.simulate(1000000), 2 + msm2.simulate(1000000)])  # simulate trajectory

Given the trajectory, we fit a collection of MSMs:

>>> model = MaximumLikelihoodMSM(reversible=True).fit(traj, lagtime=1).fetch_model()

The model behaves like a MSM on the largest connected set, but the behavior can be changed by selecting, e.g., the second largest connected set:

>>> model.state_symbols()
array([2, 3, 4])
>>> model.select(1)  # change to second largest connected set
>>> model.state_symbols()
array([0, 1])

And this is all the models contained in the collection:

>>> model.n_connected_msms
2

Alternatively, one can fit with a previously estimated count model (that can be restricted to a subset of states):

>>> counts = TransitionCountEstimator(lagtime=1, count_mode="sliding").fit(traj).fetch_model()
>>> counts = counts.submodel([0, 1])  # select submodel with state symbols [0, 1]
>>> msm = MaximumLikelihoodMSM(reversible=True).fit(counts).fetch_model()
>>> msm.state_symbols()
array([0, 1])

And this is the only model in the collection:

>>> msm.n_connected_msms
1
fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

Parameters:
  • data (array_like) – Data that is used to fit the model.

  • **kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

fit_from_counts(counts: Union[ndarray, TransitionCountEstimator, TransitionCountModel])

Fits a model from counts in form of a (n, n) count matrix, a TransitionCountModel or an instance of TransitionCountEstimator, which has been fit on data previously.

Parameters:

counts ((n, n) ndarray or TransitionCountModel or TransitionCountEstimator) –

Returns:

self – Reference to self.

Return type:

MaximumLikelihoodMSM

fit_from_discrete_timeseries(discrete_timeseries: Union[ndarray, List[ndarray]], lagtime: int, count_mode: str = 'sliding')

Fits a model directly from discrete time series data. This type of data can either be a single trajectory in form of a 1d integer numpy array or a list thereof.

Parameters:
  • discrete_timeseries (ndarray or list of ndarray) – Discrete timeseries data.

  • lagtime (int) – The lag time under which to estimate state transitions and ultimately also the transition matrix.

  • count_mode (str, default="sliding") – The count mode to use for estimating transition counts. For maximum-likelihood estimation, the recommended choice is “sliding”. If the MSM should be used for sampling in a BayesianMSM, the recommended choice is “effective”, which yields transition counts that are statistically uncorrelated. A description can be found in [2].

Returns:

self – Reference to self.

Return type:

MaximumLikelihoodMSM

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

property allow_disconnected: bool

If set to true, the resulting transition matrix may have disconnected and transient states.

property has_model: bool

Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.

Type:

bool

property model

Shortcut to fetch_model().

property reversible: bool

If true compute reversible MarkovStateModel, else non-reversible MarkovStateModel

property sparse: bool

If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra. In this case python sparse matrices will be returned by the corresponding functions instead of numpy arrays. This behavior is suggested for very large numbers of states (e.g. > 4000) because it is likely to be much more efficient.

property stationary_distribution_constraint: Optional[ndarray]

The stationary distribution constraint that can either be None (no constraint) or constrains the count and transition matrices to states with positive stationary vector entries.

Getter:

Yields the currently configured constraint vector, can be None.

Setter:

Sets a stationary distribution constraint by giving a stationary vector as value. The estimated count- and transition-matrices are restricted to states that have positive entries. In case the vector is not normalized, setting it here implicitly copies and normalizes it.

Type:

ndarray or None