class TransitionCountEstimator

class deeptime.markov.TransitionCountEstimator(lagtime: int, count_mode: str, n_states=None, sparse=False)

Estimator which produces a TransitionCountModel given discretized trajectories. Hereby one can decide whether the count mode should be:

  • sample: A trajectory of length T will have \(T / \tau\) counts at time indices

    \[(0 \rightarrow \tau), (\tau \rightarrow 2 \tau), ..., (((T/ \tau )-1) \tau \rightarrow T) \]
  • sliding: A trajectory of length T will have \(T-\tau\) counts at time indices

    \[(0 \rightarrow \tau), (1 \rightarrow \tau+1), ..., (T-\tau-1 \rightarrow T-1) \]

    This introduces an overestimation of the actual count values by a factor of “lagtime”. For maximum-likelihood MSMs this plays no role but it leads to wrong error bars in uncertainty estimation.

  • sliding-effective: See sliding mode, just that the resulting count matrix is divided by the lagtime after counting. This which can be shown to provide a likelihood that is the geometrical average over shifted subsamples of the trajectory, \((s_1,\:s_{tau+1},\:...),\:(s_2,\:t_{tau+2},\:...),\) etc. This geometrical average converges to the correct likelihood in the statistical limit [1].

  • effective: Uses an estimate of the transition counts that are statistically uncorrelated. Recommended when used with a Bayesian MSM. A description of the estimation procedure can be found in [2].

Parameters:
  • lagtime (int) – Distance between two frames in the discretized trajectories under which their potential change of state is considered a transition.

  • count_mode (str) –

    One of “sample”, “sliding”, “sliding-effective”, and “effective”.

    • ”sample” strides the trajectory with lagtime \(\tau\) and uses the strided counts as transitions.

    • ”sliding” uses a sliding window approach, yielding counts that are statistically correlated and too large by a factor of \(\tau\); in uncertainty estimation this yields wrong uncertainties.

    • ”sliding-effective” takes “sliding” and divides it by \(\tau\), which can be shown to provide a likelihood that is the geometrical average over shifted subsamples of the trajectory, \((s_1,\:s_{tau+1},\:...),\:(s_2,\:t_{tau+2},\:...),\) etc. This geometrical average converges to the correct likelihood in the statistical limit [1].

    • ”effective” uses an estimate of the transition counts that are statistically uncorrelated. Recommended when estimating Bayesian MSMs.

  • n_states (int, optional, default=None) – Normally, the shape of the count matrix is a consequence of the number of encountered states in given discrete trajectories. However sometimes (for instance when scoring), only a portion of the discrete trajectories is passed but the count matrix should still have the correct shape. Then, this argument can be used to artificially set the number of states to the correct value.

  • sparse (bool, optional, default=False) – Whether sparse matrices should be used for counting. This can make sense when the number of states is very large.

References

Attributes

count_mode

The currently selected count mode.

has_model

Property reporting whether this estimator contains an estimated model.

lagtime

The lagtime at which transitions are counted.

model

Shortcut to fetch_model().

n_states

The number of states in discrete trajectories.

sparse

Whether the resulting count matrix is stored in sparse or dense mode.

Methods

count(count_mode, dtrajs, lagtime[, sparse, ...])

Computes a count matrix based on a counting mode, some discrete trajectories, a lagtime, and whether to use sparse matrices.

fetch_model()

Yields the latest estimated TransitionCountModel.

fit(data, *args, **kw)

Counts transitions at given lag time according to configuration of the estimator.

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

fit_transform(data[, fit_options, ...])

Fits a model which simultaneously functions as transformer and subsequently transforms the input data.

get_params([deep])

Get the parameters.

set_params(**params)

Set the parameters of this estimator.

transform(data, **kwargs)

Transforms data with the encapsulated model.

__call__(*args, **kwargs)

Call self as a function.

static count(count_mode: str, dtrajs: List[ndarray], lagtime: int, sparse: bool = False, n_jobs=None)

Computes a count matrix based on a counting mode, some discrete trajectories, a lagtime, and whether to use sparse matrices.

Parameters:
  • count_mode (str) – The counting mode to use. One of “sample”, “sliding”, “sliding-effective”, and “effective”. See __init__() for a more detailed description.

  • dtrajs (array_like or list of array_like) – Discrete trajectories, i.e., a list of arrays which contain non-negative integer values. A single ndarray can also be passed, which is then treated as if it was a list with that one ndarray in it.

  • lagtime (int) – Distance between two frames in the discretized trajectories under which their potential change of state is considered a transition.

  • sparse (bool, default=False) – Whether to use sparse matrices or dense matrices. Sparse matrices can make sense when dealing with a lot of states.

  • n_jobs (int, optional, default=None) – This only has an effect in effective counting. Determines the number of cores to use for estimating statistical inefficiencies. Default resolves to number of available cores.

Returns:

count_matrix – The computed count matrix. Can be ndarray or sparse depending on whether sparse was set to true or false. N is the number of encountered states, i.e., np.max(dtrajs)+1.

Return type:

(N, N) ndarray or sparse array

Example

>>> dtrajs = [np.array([0,0,1,1]), np.array([0,0,1])]
>>> count_matrix = TransitionCountEstimator.count(
...     count_mode="sliding", dtrajs=dtrajs, lagtime=1, sparse=False
... )
>>> np.testing.assert_equal(count_matrix, np.array([[2, 2], [0, 1]]))
fetch_model() Optional[TransitionCountModel]

Yields the latest estimated TransitionCountModel. Might be None if fetched before any data was fit.

Return type:

The latest TransitionCountModel or None.

fit(data, *args, **kw)

Counts transitions at given lag time according to configuration of the estimator.

Parameters:

data (array_like or list of array_like) – discretized trajectories

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

Parameters:
  • data (array_like) – Data that is used to fit the model.

  • **kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

fit_transform(data, fit_options=None, transform_options=None)

Fits a model which simultaneously functions as transformer and subsequently transforms the input data. The estimated model can be accessed by calling fetch_model().

Parameters:
  • data (array_like) – The input data.

  • fit_options (dict, optional, default=None) – Optional keyword arguments passed on to the fit method.

  • transform_options (dict, optional, default=None) – Optional keyword arguments passed on to the transform method.

Returns:

output – Transformed data.

Return type:

array_like

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

transform(data, **kwargs)

Transforms data with the encapsulated model.

Parameters:
  • data (array_like) – Input data

  • **kwargs – Optional arguments.

Returns:

output – Transformed data.

Return type:

array_like

property count_mode

The currently selected count mode.

property has_model: bool

Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.

Type:

bool

property lagtime: int

The lagtime at which transitions are counted.

property model

Shortcut to fetch_model().

property n_states: Optional[bool]

The number of states in discrete trajectories. Can be used to override the effective shape of the resulting count matrix.

Getter:

Yields the currently set number of states or None.

Setter:

Sets the number of states to use or None.

Type:

bool or None

property sparse: bool

Whether the resulting count matrix is stored in sparse or dense mode.

Getter:

Yields the currently configured sparsity setting.

Setter:

Sets whether to store count matrices sparsely.

Type:

bool