class HiddenMarkovModel¶

class deeptime.markov.hmm.HiddenMarkovModel(transition_model, output_model: ndarray | OutputModel, initial_distribution: ndarray | None = None, likelihoods: ndarray | None = None, state_probabilities: List[ndarray] | None = None, initial_count: ndarray | None = None, hidden_state_trajectories: Iterable[ndarray] | None = None, stride: int | str = 1, observation_symbols: ndarray | None = None, observation_symbols_full: ndarray | None = None)¶

Hidden Markov state model consisting of a transition model (MSM) on the hidden states, an output model which maps from the hidden states to a distribution of observable states, and optionally an initial distribution on the hidden states. Some properties require a crisp assignment to states in the observable space, in which case only a discrete output model can be used.

Parameters:

transition_model ((m,m) ndarray or MarkovStateModel) – Transition matrix for hidden (macro) states
output_model ((m,n) ndarray or OutputModel) – observation probability matrix from hidden to observable (micro) states or OutputModel instance which yields the mapping from hidden to observable state.
initial_distribution ((m,) ndarray, optional, default=None) – Initial distribution of the hidden (macro) states. Default is uniform.
likelihoods ((k,) ndarray, optional, default=None) – Likelihood progression of the HMM as it was trained for k iterations with Baum-Welch.
state_probabilities (list of ndarray, optional, default=None) – List of state probabilities for each trajectory that the model was trained on (gammas).
initial_count (ndarray, optional, default=None) – Initial counts of the hidden (macro) states, computed from the gamma output of the Baum-Welch algorithm
hidden_state_trajectories (list of ndarray, optional, default=None) – When estimating the HMM the data’s most likely hidden state trajectory is determined and can be saved with the model by providing this argument.
stride (int or str('effective'), optional, default=1) – Stride which was used to subsample discrete trajectories while estimating a HMM. Can either be an integer value which determines the offset or ‘effective’, which makes an estimate of a stride at which subsequent discrete trajectory elements are uncorrelated.
observation_symbols (array_like, optional, default=None) – Sorted unique symbols in observations. If None, it is assumed that all possible observations are made and the state symbols are set to an iota range over the number of observation states.
observation_symbols_full (array_like, optional, default=None) – Full set of symbols in observations. If None, it is assumed to coincide with observation_symbols.

References

[1] (1,2,3)

Frank Noé, Hao Wu, Jan-Hendrik Prinz, and Nuria Plattner. Projected and hidden markov models for calculating kinetics and metastable states of complex molecules. The Journal of chemical physics, 139(18):11B609_1, 2013.

See also

init.discrete.metastable_from_data: initial guess from data with discrete output model
init.discrete.metastable_from_msm: initial guess from MSM with discrete output model
init.gaussian.from_data: initial guess from data with Gaussian output model
MaximumLikelihoodHMM: maximum likelihood estimation of HMMs
BayesianHMM: Bayesian sampling of models for confidences.

Attributes

`count_model`	Yields the count model for the micro (hidden) states.
`eigenvectors_left_obs`	Left eigenvectors in observation space.
`eigenvectors_right_obs`	Right eigenvectors in observation space.
`hidden_state_trajectories`	Training trajectories mapped to hidden states after estimation.
`initial_count`	The hidden initial counts, can be None.
`initial_distribution`	The initial distribution of this HMM over the hidden states.
`lagtime`	The lagtime this model was estimated at.
`lifetimes`	Lifetimes of states of the hidden transition matrix
`likelihood`	The estimated likelihood of this model based on the training data.
`likelihoods`	If the model comes from the MaximumLikelihoodHMM estimator, this property contains the sequence of likelihoods generated from the fitting iteration.
`metastable_assignments`	Computes the assignment to metastable sets for observable states
`metastable_distributions`	Returns the output probability distributions. Identical to
`metastable_memberships`	Computes the memberships of observable states to metastable sets by Bayesian inversion.
`metastable_sets`	Computes the metastable sets of observable states within each
`n_hidden_states`	The number of hidden states.
`n_observation_states`	Property determining the number of observed/macro states.
`observation_symbols`	The symbols represented by this HMM in observation space.
`observation_symbols_full`	All symbols that the original model contained (original before taking any submodel).
`output_model`	The selected output model for this HMM.
`output_probabilities`	Returns the probabilities for each hidden state to map to a particular observation state.
`state_probabilities`	List of state probabilities for each trajectory that the model was trained on (gammas in the Baum-Welch algo).
`stationary_distribution_obs`	The stationary distribution in observable space.
`stride`	The stride parameter which was used to subsample the discrete trajectories when estimating the hidden markov state model.
`transition_counts`	The transition counts for the hidden states as estimated in the fitting procedure.
`transition_model`	Yields the transition model for the hidden states.

Methods

`ck_test`(models[, include_lag0, err_est, ...])	Performs a Chapman-Kolmogorov test on a list of HMMs.
`collect_observations_in_state`(observations, ...)	Collect a vector of all observations belonging to a specified hidden state.
`compute_observation_likelihood`(data)	Computes the likelihood of observed data under this model.
`compute_viterbi_paths`(observations[, ...])	Computes the Viterbi paths using the current HMM model.
`copy`()	Makes a deep copy of this model.
`correlation_obs`(a[, b, maxtime, k, ncv])	Time-correlation for equilibrium experiment based on observable state vectors a and b.
`expectation_obs`(a)	Equilibrium expectation value of a given observable state vector.
`fingerprint_correlation_obs`(a[, b, k, ncv])	Dynamical fingerprint for equilibrium time-correlation experiment based on observable state vectors a and b.
`fingerprint_relaxation_obs`(p0, a[, k, ncv])	Dynamical fingerprint for perturbation/relaxation experiment based on observable state vector and distribution.
`get_params`([deep])	Get the parameters.
`nonempty_obs`(dtrajs)	Computes the set of visited observable states given a set of discrete trajectories.
`propagate`(p0, k)	Propagates the initial distribution p0 defined on observable space k times.
`relaxation_obs`(p0, a[, maxtime, k, ncv])	Simulates a perturbation-relaxation experiment based on observable state vector and distribution.
`sample_by_observation_probabilities`(dtrajs, ...)	Generates samples according to the current observation probability distribution.
`set_params`(**params)	Set the parameters of this estimator.
`simulate`(n_steps[, start, stop, dt])	Generates a realization of the Hidden Markov Model
`states_largest`([directed, ...])	Selects hidden states which represent the largest connected set.
`states_populous`([strong, connectivity_threshold])	Retrieves the hidden states which are most populated and connected.
`submodel`([states, obs])	Returns a HMM with restricted state space
`submodel_disconnect`([connectivity_threshold])	Disconnects sets of hidden states that are barely connected
`submodel_largest`([directed, ...])	Returns the largest connected sub-HMM.
`submodel_populous`([directed, ...])	Returns the most populous connected sub-HMM.
`timescales`([k])	Yields the timescales of the hidden transition model.
`transform_discrete_trajectories_to_observed_symbols`(dtrajs)	A list of integer arrays with the discrete trajectories mapped to the currently used set of observation symbols.
`transition_matrix_obs`([k])	Computes the transition matrix between observed states

ck_test(models, include_lag0=True, err_est=False, progress=None)¶

Performs a Chapman-Kolmogorov test on a list of HMMs.

Parameters:

models (list of HiddenMarkovModel) – list of models to test against
include_lag0 (bool, optional, default=True) – Whether to include lagtime $\tau = 0$ .
err_est (bool, optional, default=False) – Whether to include observable evaluations on estimate samples.
progress – Optional progress bar, tested for tqdm.

Returns:

ck_test – Test results.

Return type:

ChapmanKolmogorovTest

See also

deeptime.util.validation.ck_test

collect_observations_in_state(observations: List[ndarray], state_index: int)¶

Collect a vector of all observations belonging to a specified hidden state.

Parameters:

observations (list of numpy.array) – List of observed trajectories.
state_index (int) – The index of the hidden state for which corresponding observations are to be retrieved.

Returns:

collected_observations – The collected vector of observations belonging to the specified hidden state.

Return type:

numpy.array with shape (nsamples,)

Raises:

RuntimeError – A RuntimeError is raised if the HMM model does not yet have a hidden state trajectory associated with it.

compute_observation_likelihood(data: ndarray | List[ndarray])¶

Computes the likelihood of observed data under this model.

Internally, the forward pass of the Baum-Welch algorithm is used.

Parameters:: data (array_like or list of array_like) – The observations
Returns:: likelihood – The computed likelihood.
Return type:: float

compute_viterbi_paths(observations, map_observations_to_submodel: bool = False) → List[ndarray]¶

Computes the Viterbi paths using the current HMM model.

Note: In case of sub-modeling a discrete state HMM, the observation sequence must be mapped to the active states of that sub-model. This can either be done by hand beforehand or by activating the map_observations_to_submodel flag.

Parameters:

observations (list of array_like or array_like) – observations
map_observations_to_submodel (bool, optional, default = False) – If True and in case of a discrete output model, activates automatic mapping to the active sub-model states

Returns:

paths – the computed viterbi paths

Return type:

list of np.ndarray

copy() → Model¶

Makes a deep copy of this model.

Returns:: A new copy of this model.
Return type:: copy

correlation_obs(a, b=None, maxtime=None, k=None, ncv=None)¶: Time-correlation for equilibrium experiment based on observable state vectors a and b.

See also

deeptime.markov.msm.MarkovStateModel.correlation

expectation_obs(a)¶: Equilibrium expectation value of a given observable state vector.

See also

deeptime.markov.msm.MarkovStateModel.expectation

fingerprint_correlation_obs(a, b=None, k=None, ncv=None)¶: Dynamical fingerprint for equilibrium time-correlation experiment based on observable state vectors a and b.

See also

deeptime.markov.msm.MarkovStateModel.fingerprint_correlation

fingerprint_relaxation_obs(p0, a, k=None, ncv=None)¶: Dynamical fingerprint for perturbation/relaxation experiment based on observable state vector and distribution.

See also

deeptime.markov.msm.MarkovStateModel.fingerprint_relaxation

get_params(deep=False)¶

Get the parameters.

Returns:: params – Parameter names mapped to their values.
Return type:: mapping of string to any

nonempty_obs(dtrajs) → ndarray¶

Computes the set of visited observable states given a set of discrete trajectories.

Parameters:: dtrajs (array_like) – observable trajectory
Returns:: symbols – The observation symbols which are visited.
Return type:: np.ndarray

propagate(p0, k)¶

Propagates the initial distribution p0 defined on observable space k times.

Therefore computes the product

p_k = p_0^T P^k

If the lag time of transition matrix $P$ is $\tau$ , this will provide the probability distribution at time $k \tau$ .

Parameters:

p0 (ndarray(n)) – Initial distribution. Vector of size of the active set.
k (int) – Number of time steps

Returns:

pk – Distribution after k steps

Return type:

ndarray(n)

relaxation_obs(p0, a, maxtime=None, k=None, ncv=None)¶: Simulates a perturbation-relaxation experiment based on observable state vector and distribution.

See also

deeptime.markov.msm.MarkovStateModel.relaxation

sample_by_observation_probabilities(dtrajs, nsample)¶

Generates samples according to the current observation probability distribution.

Notes

Sampling from off-sample-trajectories might yield -1 indices as discrete observable states are drawn from output probability distributions and off-sample trajectories might not contain all drawn observable states.

Parameters:

dtrajs (discrete trajectory) – Input observation trajectory or list of trajectories
nsample (int) – Number of samples per distribution.

Returns:

indexes – List of the sampled indices by distribution. Each element is an index array with a number of rows equal to nsample, with rows consisting of a tuple (i, t), where i is the index of the trajectory and t is the time index within the trajectory.

Return type:

length m list of ndarray( (nsample, 2) )

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: object

simulate(n_steps, start=None, stop=None, dt=1)¶

Generates a realization of the Hidden Markov Model

Parameters:

n_steps (int) – trajectory length in steps of the lag time
start (int, optional, default = None) – starting hidden state. If not given, will sample from the stationary distribution of the hidden transition matrix.
stop (int or int-array-like, optional, default = None) – stopping hidden set. If given, the trajectory will be stopped before N steps once a hidden state of the stop set is reached
dt (int) – trajectory will be saved every dt time steps. Internally, the dt’th power of P is taken to ensure a more efficient simulation.

Returns:

htraj ((N/dt, ) ndarray) – The hidden state trajectory with length N/dt
otraj ((N/dt, ) ndarray) – The observable state discrete trajectory with length N/dt

states_largest(directed=True, connectivity_threshold='1/n') → ndarray¶

Selects hidden states which represent the largest connected set.

Parameters:

directed (bool, optional, default=True) – Whether the connectivity is strong (directed) or weak (undirected)
connectivity_threshold (str or int, optional, default='1/n') – A connectivity threshold which can be employed to only consider edges with a certain minimum weight.

Return type:

The largest connected set of hidden states

states_populous(strong=True, connectivity_threshold='1/n')¶

Retrieves the hidden states which are most populated and connected.

Parameters:

strong (bool, optional, default=True) – Whether the connectivity is evaluated based on a directed or on an undirected graph.
connectivity_threshold (str or int, optional, default=None) – Minimum weight so that two states are considered connected.

Returns:

states – Most populated set of states

Return type:

np.ndarray

submodel(states: ndarray | None = None, obs: ndarray | None = None)¶

Returns a HMM with restricted state space

Parameters:

states (None or int-array) –
Hidden states to restrict the model to. In addition to specifying the subset, possible options are:
- int-array: indices of states to restrict onto
- None : all states - don’t restrict
obs (None or int-array) –
Observed states to restrict the model to. In addition to specifying an array with the state labels to be observed, possible options are:
- int-array: indices of states to restrict onto
- None : all states - don’t restrict

Returns:

hmm – The restricted HMM.

Return type:

HiddenMarkovModel

submodel_disconnect(connectivity_threshold='1/n')¶

Disconnects sets of hidden states that are barely connected

Runs a connectivity check excluding all transition counts below connectivity_threshold. The transition matrix and stationary distribution will be re-estimated. Note that the resulting transition matrix may have both strongly and weakly connected subsets.

Parameters:: connectivity_threshold (float or '1/n') – minimum number of counts to consider a connection between two states. Counts lower than that will count zero in the connectivity check and may thus separate the resulting transition matrix. The default evaluates to 1/n_states.
Returns:: hmm – The restricted HMM.
Return type:: HiddenMarkovModel

submodel_largest(directed=True, connectivity_threshold='1/n', observe_nonempty=True, dtrajs=None)¶

Returns the largest connected sub-HMM. By default this means that the largest connected set of hidden states and the set of visited observable states is selected.

Parameters:

directed (bool, optional, default=True) – Whether the connectivity is based on a directed graph (strong connectiviy) or undirected (weak connectivity)
connectivity_threshold (str or int, optional, default='1/n') – The connectivity threshold required to consider two hidden states connected.
observe_nonempty (bool, optional, default=True) – Whether the observable state set should be restricted to visited observable states. If True, dtrajs must be provided.
dtrajs (array_like, optional, default=None) – Observable state trajectory or a list thereof to evaluate visited observable states.

Returns:

sub_hmm – The restricted HMM.

Return type:

HiddenMarkovModel

submodel_populous(directed=True, connectivity_threshold='1/n', observe_nonempty=True, dtrajs=None)¶

Returns the most populous connected sub-HMM.

Parameters:

directed (bool, optional, default=True) – Whether the connectivity is based on a directed graph (strong connectiviy) or undirected (weak connectivity)
connectivity_threshold (str or int, optional, default='1/n') – The connectivity threshold required to consider two hidden states connected.
observe_nonempty (bool, optional, default=True) – Whether the observable state set should be restricted to visited observable states. If True, dtrajs must be provided.
dtrajs (array_like, optional, default=None) – Observable state trajectory or a list thereof to evaluate visited observable states.

Returns:

hmm – The restricted HMM.

Return type:

HiddenMarkovModel

timescales(k=None)¶: Yields the timescales of the hidden transition model. See MarkovStateModel.timescales.

transform_discrete_trajectories_to_observed_symbols(dtrajs)¶

A list of integer arrays with the discrete trajectories mapped to the currently used set of observation symbols. For example, if there has been a subselection of the model for connectivity=’largest’, the indices will be given within the connected set, frames that do not correspond to a considered symbol are set to -1.

Parameters:: dtrajs (array_like or list of array_like) – discretized trajectories
Returns:: Curated discretized trajectories so that unconsidered symbols are mapped to -1.
Return type:: array_like or list of array_like

transition_matrix_obs(k=1) → ndarray¶

Computes the transition matrix between observed states

Transition matrices for longer lag times than the one used to parametrize this HMM can be obtained by setting the k option. Note that a HMM is not Markovian, thus we cannot compute transition matrices at longer lag times using the Chapman-Kolmogorow equality. I.e.:

P (k \tau) \neq P^k (\tau)

This function computes the correct transition matrix using the metastable (coarse) transition matrix $P_c$ as:

P (k \tau) = {\Pi}^{-1} \chi^{\top} ({\Pi}_c) P_c^k (\tau) \chi

where $\chi$ is the output probability matrix, $\Pi_c$ is a diagonal matrix with the metastable-state (coarse) stationary distribution and $\Pi$ is a diagonal matrix with the observable-state stationary distribution.

Parameters:: k (int, optional, default=1) – Multiple of the lag time. By default (k=1), the transition matrix at the lag time used to construct this HMM will be returned. If a higher power is given,

property count_model¶

Yields the count model for the micro (hidden) states. The count matrix is estimated from Viterbi paths.

Returns:: count_model – The count model for the micro states.
Return type:: deeptime.markov.TransitionCountModel

property eigenvectors_left_obs¶

Left eigenvectors in observation space. Only available with a discrete output model.

Return type:: Left eigenvectors in observation space.

property eigenvectors_right_obs¶

Right eigenvectors in observation space. Only available with a discrete output model.

Return type:: Right eigenvectors in observation space.

property hidden_state_trajectories: List[ndarray] | None¶

Training trajectories mapped to hidden states after estimation.

Return type:: hidden state trajectories, can be None if not provided in constructor.

property initial_count: ndarray | None¶

The hidden initial counts, can be None.

Return type:: Initial counts.

property initial_distribution: ndarray¶

The initial distribution of this HMM over the hidden states.

Return type:: The initial distribution.

property lagtime: int¶

The lagtime this model was estimated at.

Returns:: lagtime – The lagtime.
Return type:: int

property lifetimes: ndarray¶

Lifetimes of states of the hidden transition matrix

Returns:: l – state lifetimes in units of the input trajectory time step, defined by $-\tau / \ln \mid p_{ii} \mid, i = 1,...,n_\mathrm{states}$ , where $p_{ii}$ are the diagonal entries of the hidden transition matrix.
Return type:: ndarray(n_states)

property likelihood: float | None¶

The estimated likelihood of this model based on the training data. Only available if the sequence of likelihoods is provided.

Return type:: The estimated likelihood, otherwise None.

property likelihoods: ndarray | None¶

If the model comes from the MaximumLikelihoodHMM estimator, this property contains the sequence of likelihoods generated from the fitting iteration.

Return type:: Sequence of likelihoods, otherwise None.

property metastable_assignments¶

Computes the assignment to metastable sets for observable states

Notes

This is only recommended for visualization purposes. You cannot compute any actual quantity of the coarse-grained kinetics without employing the fuzzy memberships!

Returns:: For each observable state, the metastable state it is located in.
Return type:: ndarray((n) ,dtype=int)

See also

output_probabilities

property metastable_distributions¶

Returns the output probability distributions. Identical to: output_probabilities()

Returns:: Pout – output probability matrix from hidden to observable discrete states
Return type:: ndarray (m,n)

See also

output_probabilities

property metastable_memberships¶

Computes the memberships of observable states to metastable sets by Bayesian inversion. [1]

Returns:: M – A matrix containing the probability or membership of each observable state to be assigned to each metastable or hidden state. The row sums of M are 1.
Return type:: ndarray((n,m))

property metastable_sets¶

Computes the metastable sets of observable states within each: metastable set

Notes

This is only recommended for visualization purposes. You cannot compute any actual quantity of the coarse-grained kinetics without employing the fuzzy memberships!

Returns:: sets – A list of length equal to metastable states. Each element is an array with observable state indexes contained in it
Return type:: list of int-arrays

property n_hidden_states: int¶

The number of hidden states. Can also be retrieved from the output model as well as from the transition model.

Return type:: Number of hidden states

property n_observation_states: int¶

Property determining the number of observed/macro states. It coincides with the size of the second axis of the observation probabilities matrix in case of a discrete output model.

Return type:: Number of observed/macro states

property observation_symbols: ndarray | None¶

The symbols represented by this HMM in observation space. Can be None in case the output model has no discrete observations it is None.

Return type:: The list of observation symbols or None.

property observation_symbols_full: ndarray | None¶

All symbols that the original model contained (original before taking any submodel).

Return type:: The list of observation symbols or None, if there are no discrete symbols or None was provided.

property output_model: OutputModel¶

The selected output model for this HMM. The output model can map from the hidden states to observable states and can also be fitted to data.

Return type:: The output model

property output_probabilities: ndarray¶

Returns the probabilities for each hidden state to map to a particular observation state. Only available if the underlying output model is a DiscreteOutputModel.

Returns:: probabilities – a (M,N) row-stochastic matrix mapping from each hidden to each observation state
Return type:: np.ndarray

property state_probabilities: List[ndarray] | None¶

List of state probabilities for each trajectory that the model was trained on (gammas in the Baum-Welch algo).

Return type:: List of state probabilities if initially provided in the constructor.

property stationary_distribution_obs¶

The stationary distribution in observable space. Only available with a discrete output model.

Return type:: stationary distribution in observation space if available

property stride¶

The stride parameter which was used to subsample the discrete trajectories when estimating the hidden markov state model. Can either be an integer value or ‘effective’, in which case a stride is estimated at which subsequent states are uncorrelated.

Returns:: stride – The stride parameter.
Return type:: int or str

property transition_counts: ndarray | None¶

The transition counts for the hidden states as estimated in the fitting procedure.

Return type:: The transition counts, can be None if the transition model has no count model.

property transition_model¶

Yields the transition model for the hidden states.

Returns:: model – The transition model.
Return type:: deeptime.markov.msm.MarkovStateModel