class TransitionCountModel

class deeptime.markov.TransitionCountModel(count_matrix, counting_mode: Optional[str] = None, lagtime: int = 1, state_histogram: Optional[ndarray] = None, state_symbols: Optional[ndarray] = None, count_matrix_full=None, state_histogram_full: Optional[ndarray] = None)

Statistics, count matrices, and connectivity from discrete trajectories. These statistics can be used to, e.g., construct MSMs. This model can create submodels (see submodel()) that are restricted to a certain selection of states. This subselection can be made by

  • analyzing the connected sets of the count matrix (connected_sets())

  • pruning states by thresholding with a mincount_connectivity parameter,

  • or simply providing a subset of states manually.

Parameters:
  • count_matrix ((N, N) ndarray or sparse matrix) – The count matrix. In case it was estimated with ‘sliding’, it contains a factor of lagtime more counts than are statistically uncorrelated.

  • counting_mode (str, optional, default=None) – If not None, one of ‘sliding’, ‘sample’, or ‘effective’. Indicates the counting method that was used to estimate the count matrix. In case of ‘sliding’, a sliding window of the size of the lagtime was used to count transitions. It therefore contains a factor of lagtime more counts than are statistically uncorrelated. It’s fine to use this matrix for maximum likelihood estimation, but it will give far too small errors if you use it for uncertainty calculations. In order to do uncertainty calculations, use the effective count matrix, see effective_count_matrix, divide this count matrix by tau, or use ‘effective’ as estimation parameter.

  • lagtime (int, optional, default=1) – The time offset which was used to count transitions in state.

  • state_histogram (array_like, optional, default=None) – Histogram over the visited states in discretized trajectories.

  • state_symbols (array_like or DiscreteStatesManager, optional, default=None) – Symbols of the original discrete trajectory that are represented in the counting model. If None, the symbols are assumed to represent the data, i.e., a iota range over the number of states. Subselection of the model also subselects the symbols.

  • count_matrix_full (array_like, optional, default=None) – Count matrix for all state symbols. If None, the count matrix provided as first argument is assumed to take that role.

  • state_histogram_full (array_like, optional, default=None) – Histogram over all state symbols. If None, the provided state_histogram is assumed to take that role.

Attributes

count_matrix

The count matrix, possibly restricted to a subset of states.

count_matrix_full

The count matrix on full set of discrete states, irrespective as to whether they are selected or not.

counting_mode

The counting mode that was used to estimate the contained count matrix.

is_full_model

Determine whether this counting model refers to the full model that represents all states of the data.

lagtime

The lag time at which the Markov model was estimated.

n_states

Number of states

n_states_full

Full number of states represented in the underlying data.

selected_count_fraction

The fraction of counts represented in this count model.

selected_state_fraction

The fraction of states represented in this count model.

state_histogram

Histogram of discrete state counts, can be None in case no statistics were provided

state_histogram_full

Histogram over all states in the trajectories.

state_symbols

Symbols for states that are represented in this count model.

state_symbols_with_blank

Symbols for states that are represented in this count model plus a state -1 for states which are not represented in this count model.

states

The states in this model, i.e., a iota range from 0 (inclusive) to n_states() (exclusive).

total_count

Total number of counts

visited_set

The set of visited states.

Methods

connected_sets([connectivity_threshold, ...])

Computes the connected sets of the counting matrix.

copy()

Makes a deep copy of this model.

count_matrix_histogram()

Computes a histogram over states represented in the count matrix.

get_params([deep])

Get the parameters.

is_connected([directed])

Dispatches to tools.analysis.is_connected.

set_params(**params)

Set the parameters of this estimator.

states_to_symbols(states)

Converts a list of states to a list of symbols which can be related back to the original data.

submodel(states)

This returns a count model that is restricted to a selection of states.

submodel_largest([connectivity_threshold, ...])

Restricts this model to the submodel corresponding to the largest connected set of states after eliminating states that fall below the specified connectivity threshold.

symbols_to_states(symbols)

Converts a set of symbols to state indices in this count model instance.

transform_discrete_trajectories_to_submodel(dtrajs)

A list of integer arrays with the discrete trajectories mapped to the currently used set of symbols.

connected_sets(connectivity_threshold: float = 0.0, directed: bool = True, probability_constraint: Optional[ndarray] = None, sort_by_population: bool = False) List[ndarray]

Computes the connected sets of the counting matrix. A threshold can be set fixing a number of counts required to consider two states connected (states are always considered self-connected, regardless of the connectivity threshold). In case of sliding window the number of counts is increased by a factor of lagtime. In case of ‘sliding-effective’ counting, the number of sliding window counts were divided by the lagtime and can therefore also be in the open interval (0, 1). Same for ‘effective’ counting.

Parameters:
  • connectivity_threshold (float, optional, default=0.) – Number of counts required to consider two states connected. When the count matrix was estimated with effective mode or sliding-effective mode, a threshold of \(1 / \mathrm{n_states_full}\) is commonly used.

  • directed (bool, optional, default=True) – Compute connected set for directed or undirected transition graph, default directed

  • probability_constraint ((N,) ndarray, optional, default=None) – constraint on the whole state space, sets all counts to zero which have no probability

  • sort_by_population (bool, optional, default=False) – This flag can be used to order the resulting list of sets in decreasing order by the most counts.

Returns:

  • A list of arrays containing integers (states), each array representing a connected set. The list is

  • ordered decreasingly by the size of the individual components.

copy() Model

Makes a deep copy of this model.

Returns:

A new copy of this model.

Return type:

copy

count_matrix_histogram() ndarray

Computes a histogram over states represented in the count matrix. The magnitude of the values returned values depend on the mode which was used for counting.

Return type:

A (n_states,) np.ndarray histogram over the collected counts per state.

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

is_connected(directed: bool = True) bool

Dispatches to tools.analysis.is_connected.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

states_to_symbols(states: ndarray) ndarray

Converts a list of states to a list of symbols which can be related back to the original data.

Parameters:

states ((N,) ndarray) – Array of states.

Returns:

symbols – Array of symbols.

Return type:

(N,) ndarray

submodel(states: ndarray)

This returns a count model that is restricted to a selection of states.

Parameters:

states (array_like) – The states to restrict to.

Returns:

submodel – A submodel restricted to the requested states.

Return type:

TransitionCountModel

See also

submodel_largest

submodel_largest(connectivity_threshold: Union[str, float] = 0.0, directed: Optional[bool] = None, probability_constraint: Optional[ndarray] = None, sort_by_population: bool = False)

Restricts this model to the submodel corresponding to the largest connected set of states after eliminating states that fall below the specified connectivity threshold. Note that the connectivity threshold only applies to the implied graph weight between states, singleton sets are always considered connected.

Parameters:
  • connectivity_threshold (float or '1/n', optional, default=0.) – Connectivity threshold. counts that are below the specified value are disregarded when finding connected sets. In case of ‘1/n’, the threshold gets resolved to \(1 / n\_states\_full\).

  • directed (bool, optional, default=None) – Whether to look for connected sets in a directed graph or in an undirected one. Per default it looks whether a probability constraint is given. In case it is given it defaults to the undirected case, otherwise directed.

  • probability_constraint ((N,) ndarray, optional, default=None) – Constraint on the whole state space (n_states_full). Only considers states that have positive probability.

  • sort_by_population (bool, optional, default=False) – This flag can be used to use the connected set with the largest population.

Returns:

submodel – The submodel.

Return type:

TransitionCountModel

symbols_to_states(symbols)

Converts a set of symbols to state indices in this count model instance. The symbols which are no longer present in this model are discarded. It can happen that the order is changed or the result is smaller than the input length.

Parameters:

symbols (array_like) – the symbols to be mapped to state indices

Returns:

states – An array of states.

Return type:

ndarray

transform_discrete_trajectories_to_submodel(dtrajs)

A list of integer arrays with the discrete trajectories mapped to the currently used set of symbols. For example, if there has been a subselection of the model for connectivity=’largest’, the indices will be given within the connected set, frames that do not correspond to a considered symbol are set to -1.

Parameters:

dtrajs (array_like or list of array_like) – discretized trajectories

Returns:

Curated discretized trajectories so that unconsidered symbols are mapped to -1.

Return type:

array_like or list of array_like

property count_matrix

The count matrix, possibly restricted to a subset of states.

Attention: This count matrix could have been obtained by sliding a window of length tau across the data. It then contains a factor of tau more counts than are statistically uncorrelated. It’s fine to use this matrix for maximum likelihood estimation, but it will give far too small errors if you use it for uncertainty calculations. In order to do uncertainty calculations, use effective counting during estimation, or divide this count matrix by tau.

property count_matrix_full

The count matrix on full set of discrete states, irrespective as to whether they are selected or not.

property counting_mode: Optional[str]

The counting mode that was used to estimate the contained count matrix. One of ‘None’, ‘sliding’, ‘sample’, ‘effective’.

property is_full_model: bool

Determine whether this counting model refers to the full model that represents all states of the data.

Returns:

is_full_model – Whether this counting model represents all states of the data.

Return type:

bool

property lagtime: int

The lag time at which the Markov model was estimated.

property n_states: int

Number of states

property n_states_full: int

Full number of states represented in the underlying data.

property selected_count_fraction: float

The fraction of counts represented in this count model.

property selected_state_fraction: float

The fraction of states represented in this count model.

property state_histogram: Optional[ndarray]

Histogram of discrete state counts, can be None in case no statistics were provided

property state_histogram_full: Optional[ndarray]

Histogram over all states in the trajectories.

property state_symbols: ndarray

Symbols for states that are represented in this count model.

property state_symbols_with_blank

Symbols for states that are represented in this count model plus a state -1 for states which are not represented in this count model.

property states: ndarray

The states in this model, i.e., a iota range from 0 (inclusive) to n_states() (exclusive). See also: state_symbols().

property total_count: int

Total number of counts

property visited_set: ndarray

The set of visited states.