class HiddenMarkovModel

class deeptime.markov.hmm.HiddenMarkovModel(transition_model, output_model: ndarray | OutputModel, initial_distribution: ndarray | None = None, likelihoods: ndarray | None = None, state_probabilities: List[ndarray] | None = None, initial_count: ndarray | None = None, hidden_state_trajectories: Iterable[ndarray] | None = None, stride: int | str = 1, observation_symbols: ndarray | None = None, observation_symbols_full: ndarray | None = None)

Hidden Markov state model consisting of a transition model (MSM) on the hidden states, an output model which maps from the hidden states to a distribution of observable states, and optionally an initial distribution on the hidden states. Some properties require a crisp assignment to states in the observable space, in which case only a discrete output model can be used.

Parameters:
  • transition_model ((m,m) ndarray or MarkovStateModel) – Transition matrix for hidden (macro) states

  • output_model ((m,n) ndarray or OutputModel) – observation probability matrix from hidden to observable (micro) states or OutputModel instance which yields the mapping from hidden to observable state.

  • initial_distribution ((m,) ndarray, optional, default=None) – Initial distribution of the hidden (macro) states. Default is uniform.

  • likelihoods ((k,) ndarray, optional, default=None) – Likelihood progression of the HMM as it was trained for k iterations with Baum-Welch.

  • state_probabilities (list of ndarray, optional, default=None) – List of state probabilities for each trajectory that the model was trained on (gammas).

  • initial_count (ndarray, optional, default=None) – Initial counts of the hidden (macro) states, computed from the gamma output of the Baum-Welch algorithm

  • hidden_state_trajectories (list of ndarray, optional, default=None) – When estimating the HMM the data’s most likely hidden state trajectory is determined and can be saved with the model by providing this argument.

  • stride (int or str('effective'), optional, default=1) – Stride which was used to subsample discrete trajectories while estimating a HMM. Can either be an integer value which determines the offset or ‘effective’, which makes an estimate of a stride at which subsequent discrete trajectory elements are uncorrelated.

  • observation_symbols (array_like, optional, default=None) – Sorted unique symbols in observations. If None, it is assumed that all possible observations are made and the state symbols are set to an iota range over the number of observation states.

  • observation_symbols_full (array_like, optional, default=None) – Full set of symbols in observations. If None, it is assumed to coincide with observation_symbols.

References

[1] (1,2,3)

Frank Noé, Hao Wu, Jan-Hendrik Prinz, and Nuria Plattner. Projected and hidden markov models for calculating kinetics and metastable states of complex molecules. The Journal of chemical physics, 139(18):11B609_1, 2013.

See also

init.discrete.metastable_from_data

initial guess from data with discrete output model

init.discrete.metastable_from_msm

initial guess from MSM with discrete output model

init.gaussian.from_data

initial guess from data with Gaussian output model

MaximumLikelihoodHMM

maximum likelihood estimation of HMMs

BayesianHMM

Bayesian sampling of models for confidences.

Attributes

count_model

Yields the count model for the micro (hidden) states.

eigenvectors_left_obs

Left eigenvectors in observation space.

eigenvectors_right_obs

Right eigenvectors in observation space.

hidden_state_trajectories

Training trajectories mapped to hidden states after estimation.

initial_count

The hidden initial counts, can be None.

initial_distribution

The initial distribution of this HMM over the hidden states.

lagtime

The lagtime this model was estimated at.

lifetimes

Lifetimes of states of the hidden transition matrix

likelihood

The estimated likelihood of this model based on the training data.

likelihoods

If the model comes from the MaximumLikelihoodHMM estimator, this property contains the sequence of likelihoods generated from the fitting iteration.

metastable_assignments

Computes the assignment to metastable sets for observable states

metastable_distributions

Returns the output probability distributions. Identical to

metastable_memberships

Computes the memberships of observable states to metastable sets by Bayesian inversion.

metastable_sets

Computes the metastable sets of observable states within each

n_hidden_states

The number of hidden states.

n_observation_states

Property determining the number of observed/macro states.

observation_symbols

The symbols represented by this HMM in observation space.

observation_symbols_full

All symbols that the original model contained (original before taking any submodel).

output_model

The selected output model for this HMM.

output_probabilities

Returns the probabilities for each hidden state to map to a particular observation state.

state_probabilities

List of state probabilities for each trajectory that the model was trained on (gammas in the Baum-Welch algo).

stationary_distribution_obs

The stationary distribution in observable space.

stride

The stride parameter which was used to subsample the discrete trajectories when estimating the hidden markov state model.

transition_counts

The transition counts for the hidden states as estimated in the fitting procedure.

transition_model

Yields the transition model for the hidden states.

Methods

ck_test(models[, include_lag0, err_est, ...])

Performs a Chapman-Kolmogorov test on a list of HMMs.

collect_observations_in_state(observations, ...)

Collect a vector of all observations belonging to a specified hidden state.

compute_observation_likelihood(data)

Computes the likelihood of observed data under this model.

compute_viterbi_paths(observations[, ...])

Computes the Viterbi paths using the current HMM model.

copy()

Makes a deep copy of this model.

correlation_obs(a[, b, maxtime, k, ncv])

Time-correlation for equilibrium experiment based on observable state vectors a and b.

expectation_obs(a)

Equilibrium expectation value of a given observable state vector.

fingerprint_correlation_obs(a[, b, k, ncv])

Dynamical fingerprint for equilibrium time-correlation experiment based on observable state vectors a and b.

fingerprint_relaxation_obs(p0, a[, k, ncv])

Dynamical fingerprint for perturbation/relaxation experiment based on observable state vector and distribution.

get_params([deep])

Get the parameters.

nonempty_obs(dtrajs)

Computes the set of visited observable states given a set of discrete trajectories.

propagate(p0, k)

Propagates the initial distribution p0 defined on observable space k times.

relaxation_obs(p0, a[, maxtime, k, ncv])

Simulates a perturbation-relaxation experiment based on observable state vector and distribution.

sample_by_observation_probabilities(dtrajs, ...)

Generates samples according to the current observation probability distribution.

set_params(**params)

Set the parameters of this estimator.

simulate(n_steps[, start, stop, dt])

Generates a realization of the Hidden Markov Model

states_largest([directed, ...])

Selects hidden states which represent the largest connected set.

states_populous([strong, connectivity_threshold])

Retrieves the hidden states which are most populated and connected.

submodel([states, obs])

Returns a HMM with restricted state space

submodel_disconnect([connectivity_threshold])

Disconnects sets of hidden states that are barely connected

submodel_largest([directed, ...])

Returns the largest connected sub-HMM.

submodel_populous([directed, ...])

Returns the most populous connected sub-HMM.

timescales([k])

Yields the timescales of the hidden transition model.

transform_discrete_trajectories_to_observed_symbols(dtrajs)

A list of integer arrays with the discrete trajectories mapped to the currently used set of observation symbols.

transition_matrix_obs([k])

Computes the transition matrix between observed states

ck_test(models, include_lag0=True, err_est=False, progress=None)

Performs a Chapman-Kolmogorov test on a list of HMMs.

Parameters:
  • models (list of HiddenMarkovModel) – list of models to test against

  • include_lag0 (bool, optional, default=True) – Whether to include lagtime τ=0\tau = 0.

  • err_est (bool, optional, default=False) – Whether to include observable evaluations on estimate samples.

  • progress – Optional progress bar, tested for tqdm.

Returns:

ck_test – Test results.

Return type:

ChapmanKolmogorovTest

collect_observations_in_state(observations: List[ndarray], state_index: int)

Collect a vector of all observations belonging to a specified hidden state.

Parameters:
  • observations (list of numpy.array) – List of observed trajectories.

  • state_index (int) – The index of the hidden state for which corresponding observations are to be retrieved.

Returns:

collected_observations – The collected vector of observations belonging to the specified hidden state.

Return type:

numpy.array with shape (nsamples,)

Raises:

RuntimeError – A RuntimeError is raised if the HMM model does not yet have a hidden state trajectory associated with it.

compute_observation_likelihood(data: ndarray | List[ndarray])

Computes the likelihood of observed data under this model.

Internally, the forward pass of the Baum-Welch algorithm is used.

Parameters:

data (array_like or list of array_like) – The observations

Returns:

likelihood – The computed likelihood.

Return type:

float

compute_viterbi_paths(observations, map_observations_to_submodel: bool = False) List[ndarray]

Computes the Viterbi paths using the current HMM model.

Note: In case of sub-modeling a discrete state HMM, the observation sequence must be mapped to the active states of that sub-model. This can either be done by hand beforehand or by activating the map_observations_to_submodel flag.

Parameters:
  • observations (list of array_like or array_like) – observations

  • map_observations_to_submodel (bool, optional, default = False) – If True and in case of a discrete output model, activates automatic mapping to the active sub-model states

Returns:

paths – the computed viterbi paths

Return type:

list of np.ndarray

copy() Model

Makes a deep copy of this model.

Returns:

A new copy of this model.

Return type:

copy

correlation_obs(a, b=None, maxtime=None, k=None, ncv=None)

Time-correlation for equilibrium experiment based on observable state vectors a and b.

expectation_obs(a)

Equilibrium expectation value of a given observable state vector.

fingerprint_correlation_obs(a, b=None, k=None, ncv=None)

Dynamical fingerprint for equilibrium time-correlation experiment based on observable state vectors a and b.

fingerprint_relaxation_obs(p0, a, k=None, ncv=None)

Dynamical fingerprint for perturbation/relaxation experiment based on observable state vector and distribution.

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

nonempty_obs(dtrajs) ndarray

Computes the set of visited observable states given a set of discrete trajectories.

Parameters:

dtrajs (array_like) – observable trajectory

Returns:

symbols – The observation symbols which are visited.

Return type:

np.ndarray

propagate(p0, k)

Propagates the initial distribution p0 defined on observable space k times.

Therefore computes the product

pk=p0TPkp_k = p_0^T P^k

If the lag time of transition matrix PP is τ\tau, this will provide the probability distribution at time kτk \tau.

Parameters:
  • p0 (ndarray(n)) – Initial distribution. Vector of size of the active set.

  • k (int) – Number of time steps

Returns:

pk – Distribution after k steps

Return type:

ndarray(n)

relaxation_obs(p0, a, maxtime=None, k=None, ncv=None)

Simulates a perturbation-relaxation experiment based on observable state vector and distribution.

sample_by_observation_probabilities(dtrajs, nsample)

Generates samples according to the current observation probability distribution.

Notes

Sampling from off-sample-trajectories might yield -1 indices as discrete observable states are drawn from output probability distributions and off-sample trajectories might not contain all drawn observable states.

Parameters:
  • dtrajs (discrete trajectory) – Input observation trajectory or list of trajectories

  • nsample (int) – Number of samples per distribution.

Returns:

indexes – List of the sampled indices by distribution. Each element is an index array with a number of rows equal to nsample, with rows consisting of a tuple (i, t), where i is the index of the trajectory and t is the time index within the trajectory.

Return type:

length m list of ndarray( (nsample, 2) )

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

simulate(n_steps, start=None, stop=None, dt=1)

Generates a realization of the Hidden Markov Model

Parameters:
  • n_steps (int) – trajectory length in steps of the lag time

  • start (int, optional, default = None) – starting hidden state. If not given, will sample from the stationary distribution of the hidden transition matrix.

  • stop (int or int-array-like, optional, default = None) – stopping hidden set. If given, the trajectory will be stopped before N steps once a hidden state of the stop set is reached

  • dt (int) – trajectory will be saved every dt time steps. Internally, the dt’th power of P is taken to ensure a more efficient simulation.

Returns:

  • htraj ((N/dt, ) ndarray) – The hidden state trajectory with length N/dt

  • otraj ((N/dt, ) ndarray) – The observable state discrete trajectory with length N/dt

states_largest(directed=True, connectivity_threshold='1/n') ndarray

Selects hidden states which represent the largest connected set.

Parameters:
  • directed (bool, optional, default=True) – Whether the connectivity is strong (directed) or weak (undirected)

  • connectivity_threshold (str or int, optional, default='1/n') – A connectivity threshold which can be employed to only consider edges with a certain minimum weight.

Return type:

The largest connected set of hidden states

states_populous(strong=True, connectivity_threshold='1/n')

Retrieves the hidden states which are most populated and connected.

Parameters:
  • strong (bool, optional, default=True) – Whether the connectivity is evaluated based on a directed or on an undirected graph.

  • connectivity_threshold (str or int, optional, default=None) – Minimum weight so that two states are considered connected.

Returns:

states – Most populated set of states

Return type:

np.ndarray

submodel(states: ndarray | None = None, obs: ndarray | None = None)

Returns a HMM with restricted state space

Parameters:
  • states (None or int-array) –

    Hidden states to restrict the model to. In addition to specifying the subset, possible options are:

    • int-array: indices of states to restrict onto

    • None : all states - don’t restrict

  • obs (None or int-array) –

    Observed states to restrict the model to. In addition to specifying an array with the state labels to be observed, possible options are:

    • int-array: indices of states to restrict onto

    • None : all states - don’t restrict

Returns:

hmm – The restricted HMM.

Return type:

HiddenMarkovModel

submodel_disconnect(connectivity_threshold='1/n')

Disconnects sets of hidden states that are barely connected

Runs a connectivity check excluding all transition counts below connectivity_threshold. The transition matrix and stationary distribution will be re-estimated. Note that the resulting transition matrix may have both strongly and weakly connected subsets.

Parameters:

connectivity_threshold (float or '1/n') – minimum number of counts to consider a connection between two states. Counts lower than that will count zero in the connectivity check and may thus separate the resulting transition matrix. The default evaluates to 1/n_states.

Returns:

hmm – The restricted HMM.

Return type:

HiddenMarkovModel

submodel_largest(directed=True, connectivity_threshold='1/n', observe_nonempty=True, dtrajs=None)

Returns the largest connected sub-HMM. By default this means that the largest connected set of hidden states and the set of visited observable states is selected.

Parameters:
  • directed (bool, optional, default=True) – Whether the connectivity is based on a directed graph (strong connectiviy) or undirected (weak connectivity)

  • connectivity_threshold (str or int, optional, default='1/n') – The connectivity threshold required to consider two hidden states connected.

  • observe_nonempty (bool, optional, default=True) – Whether the observable state set should be restricted to visited observable states. If True, dtrajs must be provided.

  • dtrajs (array_like, optional, default=None) – Observable state trajectory or a list thereof to evaluate visited observable states.

Returns:

sub_hmm – The restricted HMM.

Return type:

HiddenMarkovModel

submodel_populous(directed=True, connectivity_threshold='1/n', observe_nonempty=True, dtrajs=None)

Returns the most populous connected sub-HMM.

Parameters:
  • directed (bool, optional, default=True) – Whether the connectivity is based on a directed graph (strong connectiviy) or undirected (weak connectivity)

  • connectivity_threshold (str or int, optional, default='1/n') – The connectivity threshold required to consider two hidden states connected.

  • observe_nonempty (bool, optional, default=True) – Whether the observable state set should be restricted to visited observable states. If True, dtrajs must be provided.

  • dtrajs (array_like, optional, default=None) – Observable state trajectory or a list thereof to evaluate visited observable states.

Returns:

hmm – The restricted HMM.

Return type:

HiddenMarkovModel

timescales(k=None)

Yields the timescales of the hidden transition model. See MarkovStateModel.timescales.

transform_discrete_trajectories_to_observed_symbols(dtrajs)

A list of integer arrays with the discrete trajectories mapped to the currently used set of observation symbols. For example, if there has been a subselection of the model for connectivity=’largest’, the indices will be given within the connected set, frames that do not correspond to a considered symbol are set to -1.

Parameters:

dtrajs (array_like or list of array_like) – discretized trajectories

Returns:

Curated discretized trajectories so that unconsidered symbols are mapped to -1.

Return type:

array_like or list of array_like

transition_matrix_obs(k=1) ndarray

Computes the transition matrix between observed states

Transition matrices for longer lag times than the one used to parametrize this HMM can be obtained by setting the k option. Note that a HMM is not Markovian, thus we cannot compute transition matrices at longer lag times using the Chapman-Kolmogorow equality. I.e.:

P(kτ)Pk(τ)P (k \tau) \neq P^k (\tau)

This function computes the correct transition matrix using the metastable (coarse) transition matrix PcP_c as:

P(kτ)=Π1χ(Πc)Pck(τ)χP (k \tau) = {\Pi}^{-1} \chi^{\top} ({\Pi}_c) P_c^k (\tau) \chi

where χ\chi is the output probability matrix, Πc\Pi_c is a diagonal matrix with the metastable-state (coarse) stationary distribution and Π\Pi is a diagonal matrix with the observable-state stationary distribution.

Parameters:

k (int, optional, default=1) – Multiple of the lag time. By default (k=1), the transition matrix at the lag time used to construct this HMM will be returned. If a higher power is given,

property count_model

Yields the count model for the micro (hidden) states. The count matrix is estimated from Viterbi paths.

Returns:

count_model – The count model for the micro states.

Return type:

deeptime.markov.TransitionCountModel

property eigenvectors_left_obs

Left eigenvectors in observation space. Only available with a discrete output model.

Return type:

Left eigenvectors in observation space.

property eigenvectors_right_obs

Right eigenvectors in observation space. Only available with a discrete output model.

Return type:

Right eigenvectors in observation space.

property hidden_state_trajectories: List[ndarray] | None

Training trajectories mapped to hidden states after estimation.

Return type:

hidden state trajectories, can be None if not provided in constructor.

property initial_count: ndarray | None

The hidden initial counts, can be None.

Return type:

Initial counts.

property initial_distribution: ndarray

The initial distribution of this HMM over the hidden states.

Return type:

The initial distribution.

property lagtime: int

The lagtime this model was estimated at.

Returns:

lagtime – The lagtime.

Return type:

int

property lifetimes: ndarray

Lifetimes of states of the hidden transition matrix

Returns:

l – state lifetimes in units of the input trajectory time step, defined by τ/lnpii,i=1,...,nstates-\tau / \ln \mid p_{ii} \mid, i = 1,...,n_\mathrm{states}, where piip_{ii} are the diagonal entries of the hidden transition matrix.

Return type:

ndarray(n_states)

property likelihood: float | None

The estimated likelihood of this model based on the training data. Only available if the sequence of likelihoods is provided.

Return type:

The estimated likelihood, otherwise None.

property likelihoods: ndarray | None

If the model comes from the MaximumLikelihoodHMM estimator, this property contains the sequence of likelihoods generated from the fitting iteration.

Return type:

Sequence of likelihoods, otherwise None.

property metastable_assignments

Computes the assignment to metastable sets for observable states

Notes

This is only recommended for visualization purposes. You cannot compute any actual quantity of the coarse-grained kinetics without employing the fuzzy memberships!

Returns:

For each observable state, the metastable state it is located in.

Return type:

ndarray((n) ,dtype=int)

property metastable_distributions
Returns the output probability distributions. Identical to

output_probabilities()

Returns:

Pout – output probability matrix from hidden to observable discrete states

Return type:

ndarray (m,n)

property metastable_memberships

Computes the memberships of observable states to metastable sets by Bayesian inversion. [1]

Returns:

M – A matrix containing the probability or membership of each observable state to be assigned to each metastable or hidden state. The row sums of M are 1.

Return type:

ndarray((n,m))

property metastable_sets
Computes the metastable sets of observable states within each

metastable set

Notes

This is only recommended for visualization purposes. You cannot compute any actual quantity of the coarse-grained kinetics without employing the fuzzy memberships!

Returns:

sets – A list of length equal to metastable states. Each element is an array with observable state indexes contained in it

Return type:

list of int-arrays

property n_hidden_states: int

The number of hidden states. Can also be retrieved from the output model as well as from the transition model.

Return type:

Number of hidden states

property n_observation_states: int

Property determining the number of observed/macro states. It coincides with the size of the second axis of the observation probabilities matrix in case of a discrete output model.

Return type:

Number of observed/macro states

property observation_symbols: ndarray | None

The symbols represented by this HMM in observation space. Can be None in case the output model has no discrete observations it is None.

Return type:

The list of observation symbols or None.

property observation_symbols_full: ndarray | None

All symbols that the original model contained (original before taking any submodel).

Return type:

The list of observation symbols or None, if there are no discrete symbols or None was provided.

property output_model: OutputModel

The selected output model for this HMM. The output model can map from the hidden states to observable states and can also be fitted to data.

Return type:

The output model

property output_probabilities: ndarray

Returns the probabilities for each hidden state to map to a particular observation state. Only available if the underlying output model is a DiscreteOutputModel.

Returns:

probabilities – a (M,N) row-stochastic matrix mapping from each hidden to each observation state

Return type:

np.ndarray

property state_probabilities: List[ndarray] | None

List of state probabilities for each trajectory that the model was trained on (gammas in the Baum-Welch algo).

Return type:

List of state probabilities if initially provided in the constructor.

property stationary_distribution_obs

The stationary distribution in observable space. Only available with a discrete output model.

Return type:

stationary distribution in observation space if available

property stride

The stride parameter which was used to subsample the discrete trajectories when estimating the hidden markov state model. Can either be an integer value or ‘effective’, in which case a stride is estimated at which subsequent states are uncorrelated.

Returns:

stride – The stride parameter.

Return type:

int or str

property transition_counts: ndarray | None

The transition counts for the hidden states as estimated in the fitting procedure.

Return type:

The transition counts, can be None if the transition model has no count model.

property transition_model

Yields the transition model for the hidden states.

Returns:

model – The transition model.

Return type:

deeptime.markov.msm.MarkovStateModel