function metastable_from_data

deeptime.markov.hmm.init.discrete.metastable_from_data(dtrajs, n_hidden_states, lagtime, stride=1, mode='largest-regularized', reversible: bool = True, stationary: bool = False, separate_symbols=None, states: Optional[ndarray] = None, regularize: bool = True, connectivity_threshold: Union[str, float] = 0.0)

Estimates an initial guess HMM from given discrete trajectories.

Following the procedure described in [1]: First a MSM is estimated, which is then subsequently coarse-grained with PCCA+ [2]. After estimation of the MSM, this method calls metastable_from_msm().

Parameters:
  • dtrajs (array_like or list of array_like) – A discrete trajectory or a list of discrete trajectories.

  • n_hidden_states (int) – Number of hidden states.

  • lagtime (int) – The lagtime at which transitions are counted.

  • stride (int or str, optional, default=1) –

    stride between two lagged trajectories extracted from the input trajectories. Given trajectory s[t], stride and lag will result in trajectories

    s[0], s[lag], s[2 lag], ...

    s[stride], s[stride + lag], s[stride + 2 lag], ...

    Setting stride = 1 will result in using all data (useful for maximum likelihood estimator), while a Bayesian estimator requires a longer stride in order to have statistically uncorrelated trajectories. Setting stride='effective' uses the largest neglected timescale as an estimate for the correlation time and sets the stride accordingly.

  • mode (str, optional, default='largest-regularized') –

    The mode at which the markov state model is estimated. Since the process is assumed to be reversible and finite statistics might lead to unconnected regions in state space, a subselection can automatically be made and the count matrix can be regularized. The following options are available:

    For regularization, each of the options can be suffixed by a ‘-regularized’, e.g., ‘largest-regularized’. This means that the count matrix has no zero entries and everything is reversibly connected. In particular, a prior of the form

    \[b_{ij}=\left \{ \begin{array}{rl} \alpha & \text{, if }c_{ij}+c_{ji}>0, \\ 0 & \text{, otherwise,} \end{array} \right . \]

    with \(\alpha=10^{-3}\) is added and all non-reversibly connected components are artifically connected by adding backward paths.

  • reversible (bool, optional, default=True) – Whether the HMM transition matrix is estimated so that it is reversibe.

  • stationary (bool, optional, default=False) – If True, the initial distribution of hidden states is self-consistently computed as the stationary distribution of the transition matrix. If False, it will be estimated from the starting states. Only set this to true if you’re sure that the observation trajectories are initiated from a global equilibrium distribution.

  • separate_symbols (array_like, optional, default=None) – Force the given set of observed states to stay in a separate hidden state. The remaining #(observed states)-1 states will be assigned by a metastable decomposition.

  • states ((dtype=int) ndarray, optional, default=None) – Artifically restrict count model to selection of states, even before regularization.

  • regularize (bool, optional, default=True) – If set to True, makes sure that the hidden initial distribution and transition matrix have nonzero probabilities by setting them to eps and then renormalizing. Avoids zeros that would cause estimation algorithms to crash or get stuck in suboptimal states.

  • connectivity_threshold (float or '1/n', optional, default=0.) – Connectivity threshold. counts that are below the specified value are disregarded when finding connected sets. In case of ‘1/n’, the threshold gets resolved to \(1 / \mathrm{n\_states\_full}\).

Returns:

hmm_init – An initial guess for the HMM

Return type:

HiddenMarkovModel

See also

DiscreteOutputModel

The type of output model this heuristic uses.

metastable_from_msm()

Initial guess from an already existing MSM.

deeptime.markov.hmm.init.gaussian.from_data()

Initial guess with Gaussian output model.

References