class TransitionCountModel¶
- class deeptime.markov.TransitionCountModel(count_matrix, counting_mode: Optional[str] = None, lagtime: int = 1, state_histogram: Optional[ndarray] = None, state_symbols: Optional[ndarray] = None, count_matrix_full=None, state_histogram_full: Optional[ndarray] = None)¶
Statistics, count matrices, and connectivity from discrete trajectories. These statistics can be used to, e.g., construct MSMs. This model can create submodels (see
submodel()
) that are restricted to a certain selection of states. This subselection can be made byanalyzing the connected sets of the count matrix (
connected_sets()
)pruning states by thresholding with a mincount_connectivity parameter,
or simply providing a subset of states manually.
- Parameters:
count_matrix ((N, N) ndarray or sparse matrix) – The count matrix. In case it was estimated with ‘sliding’, it contains a factor of lagtime more counts than are statistically uncorrelated.
counting_mode (str, optional, default=None) – If not None, one of ‘sliding’, ‘sample’, or ‘effective’. Indicates the counting method that was used to estimate the count matrix. In case of ‘sliding’, a sliding window of the size of the lagtime was used to count transitions. It therefore contains a factor of lagtime more counts than are statistically uncorrelated. It’s fine to use this matrix for maximum likelihood estimation, but it will give far too small errors if you use it for uncertainty calculations. In order to do uncertainty calculations, use the effective count matrix, see
effective_count_matrix
, divide this count matrix by tau, or use ‘effective’ as estimation parameter.lagtime (int, optional, default=1) – The time offset which was used to count transitions in state.
state_histogram (array_like, optional, default=None) – Histogram over the visited states in discretized trajectories.
state_symbols (array_like or DiscreteStatesManager, optional, default=None) – Symbols of the original discrete trajectory that are represented in the counting model. If None, the symbols are assumed to represent the data, i.e., a iota range over the number of states. Subselection of the model also subselects the symbols.
count_matrix_full (array_like, optional, default=None) – Count matrix for all state symbols. If None, the count matrix provided as first argument is assumed to take that role.
state_histogram_full (array_like, optional, default=None) – Histogram over all state symbols. If None, the provided state_histogram is assumed to take that role.
See also
Attributes
The count matrix, possibly restricted to a subset of states.
The count matrix on full set of discrete states, irrespective as to whether they are selected or not.
The counting mode that was used to estimate the contained count matrix.
Determine whether this counting model refers to the full model that represents all states of the data.
The lag time at which the Markov model was estimated.
Number of states
Full number of states represented in the underlying data.
The fraction of counts represented in this count model.
The fraction of states represented in this count model.
Histogram of discrete state counts, can be None in case no statistics were provided
Histogram over all states in the trajectories.
Symbols for states that are represented in this count model.
Symbols for states that are represented in this count model plus a state -1 for states which are not represented in this count model.
The states in this model, i.e., a iota range from 0 (inclusive) to
n_states()
(exclusive).Total number of counts
The set of visited states.
Methods
connected_sets
([connectivity_threshold, ...])Computes the connected sets of the counting matrix.
copy
()Makes a deep copy of this model.
Computes a histogram over states represented in the count matrix.
get_params
([deep])Get the parameters.
is_connected
([directed])Dispatches to
tools.analysis.is_connected
.set_params
(**params)Set the parameters of this estimator.
states_to_symbols
(states)Converts a list of states to a list of symbols which can be related back to the original data.
submodel
(states)This returns a count model that is restricted to a selection of states.
submodel_largest
([connectivity_threshold, ...])Restricts this model to the submodel corresponding to the largest connected set of states after eliminating states that fall below the specified connectivity threshold.
symbols_to_states
(symbols)Converts a set of symbols to state indices in this count model instance.
A list of integer arrays with the discrete trajectories mapped to the currently used set of symbols.
- connected_sets(connectivity_threshold: float = 0.0, directed: bool = True, probability_constraint: Optional[ndarray] = None, sort_by_population: bool = False) List[ndarray] ¶
Computes the connected sets of the counting matrix. A threshold can be set fixing a number of counts required to consider two states connected (states are always considered self-connected, regardless of the connectivity threshold). In case of sliding window the number of counts is increased by a factor of lagtime. In case of ‘sliding-effective’ counting, the number of sliding window counts were divided by the lagtime and can therefore also be in the open interval (0, 1). Same for ‘effective’ counting.
- Parameters:
connectivity_threshold (float, optional, default=0.) – Number of counts required to consider two states connected. When the count matrix was estimated with effective mode or sliding-effective mode, a threshold of \(1 / \mathrm{n_states_full}\) is commonly used.
directed (bool, optional, default=True) – Compute connected set for directed or undirected transition graph, default directed
probability_constraint ((N,) ndarray, optional, default=None) – constraint on the whole state space, sets all counts to zero which have no probability
sort_by_population (bool, optional, default=False) – This flag can be used to order the resulting list of sets in decreasing order by the most counts.
- Returns:
A list of arrays containing integers (states), each array representing a connected set. The list is
ordered decreasingly by the size of the individual components.
See also
- copy() Model ¶
Makes a deep copy of this model.
- Returns:
A new copy of this model.
- Return type:
copy
- count_matrix_histogram() ndarray ¶
Computes a histogram over states represented in the count matrix. The magnitude of the values returned values depend on the mode which was used for counting.
- Return type:
A (n_states,) np.ndarray histogram over the collected counts per state.
- get_params(deep=False)¶
Get the parameters.
- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- is_connected(directed: bool = True) bool ¶
Dispatches to
tools.analysis.is_connected
.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
object
- states_to_symbols(states: ndarray) ndarray ¶
Converts a list of states to a list of symbols which can be related back to the original data.
- Parameters:
states ((N,) ndarray) – Array of states.
- Returns:
symbols – Array of symbols.
- Return type:
(N,) ndarray
- submodel(states: ndarray)¶
This returns a count model that is restricted to a selection of states.
- Parameters:
states (array_like) – The states to restrict to.
- Returns:
submodel – A submodel restricted to the requested states.
- Return type:
See also
- submodel_largest(connectivity_threshold: Union[str, float] = 0.0, directed: Optional[bool] = None, probability_constraint: Optional[ndarray] = None, sort_by_population: bool = False)¶
Restricts this model to the submodel corresponding to the largest connected set of states after eliminating states that fall below the specified connectivity threshold. Note that the connectivity threshold only applies to the implied graph weight between states, singleton sets are always considered connected.
- Parameters:
connectivity_threshold (float or '1/n', optional, default=0.) – Connectivity threshold. counts that are below the specified value are disregarded when finding connected sets. In case of ‘1/n’, the threshold gets resolved to \(1 / n\_states\_full\).
directed (bool, optional, default=None) – Whether to look for connected sets in a directed graph or in an undirected one. Per default it looks whether a probability constraint is given. In case it is given it defaults to the undirected case, otherwise directed.
probability_constraint ((N,) ndarray, optional, default=None) – Constraint on the whole state space (n_states_full). Only considers states that have positive probability.
sort_by_population (bool, optional, default=False) – This flag can be used to use the connected set with the largest population.
- Returns:
submodel – The submodel.
- Return type:
See also
- symbols_to_states(symbols)¶
Converts a set of symbols to state indices in this count model instance. The symbols which are no longer present in this model are discarded. It can happen that the order is changed or the result is smaller than the input length.
- Parameters:
symbols (array_like) – the symbols to be mapped to state indices
- Returns:
states – An array of states.
- Return type:
ndarray
- transform_discrete_trajectories_to_submodel(dtrajs)¶
A list of integer arrays with the discrete trajectories mapped to the currently used set of symbols. For example, if there has been a subselection of the model for connectivity=’largest’, the indices will be given within the connected set, frames that do not correspond to a considered symbol are set to -1.
- Parameters:
dtrajs (array_like or list of array_like) – discretized trajectories
- Returns:
Curated discretized trajectories so that unconsidered symbols are mapped to -1.
- Return type:
array_like or list of array_like
- property count_matrix¶
The count matrix, possibly restricted to a subset of states.
Attention: This count matrix could have been obtained by sliding a window of length tau across the data. It then contains a factor of tau more counts than are statistically uncorrelated. It’s fine to use this matrix for maximum likelihood estimation, but it will give far too small errors if you use it for uncertainty calculations. In order to do uncertainty calculations, use effective counting during estimation, or divide this count matrix by tau.
- property count_matrix_full¶
The count matrix on full set of discrete states, irrespective as to whether they are selected or not.
- property counting_mode: Optional[str]¶
The counting mode that was used to estimate the contained count matrix. One of ‘None’, ‘sliding’, ‘sample’, ‘effective’.
- property is_full_model: bool¶
Determine whether this counting model refers to the full model that represents all states of the data.
- Returns:
is_full_model – Whether this counting model represents all states of the data.
- Return type:
bool
- property lagtime: int¶
The lag time at which the Markov model was estimated.
- property n_states: int¶
Number of states
- property n_states_full: int¶
Full number of states represented in the underlying data.
- property selected_count_fraction: float¶
The fraction of counts represented in this count model.
- property selected_state_fraction: float¶
The fraction of states represented in this count model.
- property state_histogram: Optional[ndarray]¶
Histogram of discrete state counts, can be None in case no statistics were provided
- property state_histogram_full: Optional[ndarray]¶
Histogram over all states in the trajectories.
- property state_symbols: ndarray¶
Symbols for states that are represented in this count model.
- property state_symbols_with_blank¶
Symbols for states that are represented in this count model plus a state -1 for states which are not represented in this count model.
- property states: ndarray¶
The states in this model, i.e., a iota range from 0 (inclusive) to
n_states()
(exclusive). See also:state_symbols()
.
- property total_count: int¶
Total number of counts
- property visited_set: ndarray¶
The set of visited states.