class TICA

class deeptime.decomposition.TICA(lagtime: ~typing.Optional[int] = None, epsilon: float = 1e-06, dim: ~typing.Optional[int] = None, var_cutoff: ~typing.Optional[float] = None, scaling: ~typing.Optional[str] = 'kinetic_map', observable_transform: ~typing.Callable[[~numpy.ndarray], ~numpy.ndarray] = <deeptime.basis._monomials.Identity object>)

Time-lagged independent component analysis (TICA).

TICA is a linear transformation method. In contrast to PCA, which finds coordinates of maximal variance, TICA finds coordinates of maximal autocorrelation at the given lag time. Therefore, TICA is useful in order to find the slow components in a dataset and thus an excellent choice to transform molecular dynamics data before clustering data for the construction of a Markov model. When the input data is the result of a Markov process (such as thermostatted molecular dynamics), TICA finds in fact an approximation to the eigenfunctions and eigenvalues of the underlying Markov operator [1].

It estimates a TICA transformation from data. The resulting model can be used to obtain eigenvalues, eigenvectors, or project input data onto the slowest TICA components.

Parameters:
  • lagtime (int or None, optional, default=None) – The lagtime under which covariances are estimated. This is only relevant when estimating from data, in case covariances are provided this should either be None or exactly the value that was used to estimate said covariances.

  • epsilon (float, optional, default=1e-6) – Eigenvalue norm cutoff. Eigenvalues of C0 with norms <= epsilon will be cut off. The remaining number of eigenvalues define the size of the output.

  • dim (int, optional, default=None) –

    Number of dimensions to keep:

    • if dim is not set (None) all available ranks are kept: n_components == min(n_samples, n_uncorrelated_features)

    • if dim is an integer >= 1, this number specifies the number of dimensions to keep.

  • var_cutoff (float, optional, default=None) – Determines the number of output dimensions by including dimensions until their cumulative kinetic variance exceeds the fraction subspace variance. var_cutoff=1.0 means all numerically available dimensions (see epsilon) will be used, unless set by dim. Setting var_cutoff smaller than 1.0 is exclusive with dim.

  • scaling (str or None, default='kinetic_map') – Can be set to None, ‘kinetic_map’ ([2]), or ‘commute_map’ ([3]). For more details see scaling.

  • observable_transform (callable, optional, default=Identity) – A feature transformation on the raw data which is used to estimate the model.

Notes

Given a sequence of multivariate data \(X_t\), it computes the mean-free covariance and time-lagged covariance matrix:

\[\begin{aligned} C_0 &= (X_t - \mu)^T \mathrm{diag}(w) (X_t - \mu) \\ C_{\tau} &= (X_t - \mu)^T \mathrm{diag}(w) (X_{t + \tau} - \mu) \end{aligned}\]

where \(w\) is a vector of weights for each time step. By default, these weights are all equal to one, but different weights are possible, like the re-weighting to equilibrium described in [4]. Subsequently, the eigenvalue problem

\[C_{\tau} r_i = C_0 \lambda_i r_i, \]

is solved,where \(r_i\) are the independent components and \(\lambda_i\) are their respective normalized time-autocorrelations. The eigenvalues are related to the relaxation timescale by

\[t_i = -\frac{\tau}{\ln |\lambda_i|}.\]

When used as a dimension reduction method, the input data is projected onto the dominant independent components.

Under the assumption of reversible dynamics and the limit of good statistics, the time-lagged autocovariance \(C_\tau\) is symmetric. Due to finite data, this symmetry is explicitly enforced in the estimator.

TICA was originally introduced for signal processing in [5]. It was introduced to molecular dynamics and as a method for the construction of Markov models in [1] and [6]. It was shown in [1] that when applied to molecular dynamics data, TICA is an approximation to the eigenvalues and eigenvectors of the true underlying dynamics.

Examples

Invoke TICA transformation with a given lag time and output dimension:

>>> import numpy as np
>>> from deeptime.decomposition import TICA
>>> data = np.random.random((100,3))
>>> # fixed output dimension
>>> estimator = TICA(dim=1, lagtime=2).fit(data)
>>> model_onedim = estimator.fetch_model()
>>> projected_data = model_onedim.transform(data)
>>> np.testing.assert_equal(projected_data.shape[1], 1)

or invoke it with a percentage value of to-be captured kinetic variance (80% in the example)

>>> estimator = TICA(var_cutoff=0.8, lagtime=2).fit(data)
>>> model_var = estimator.fetch_model()
>>> projected_data = model_var.transform(data)

For a brief explaination why TICA outperforms PCA to extract a good reaction coordinate have a look here.

See also

CovarianceKoopmanModel

TICA estimation output model

References

Attributes

dim

Dimension attribute.

epsilon

Eigenvalue norm cutoff.

has_model

Property reporting whether this estimator contains an estimated model.

lagtime

The lagtime under which covariances are estimated.

model

Shortcut to fetch_model().

scaling

Scaling parameter.

var_cutoff

Variational cutoff which can be used to further restrict the dimension.

Methods

covariance_estimator(lagtime[, ncov])

Yields a properly configured covariance estimator so that its model can be used as input for the vamp estimator.

fetch_model()

Finalizes current model and yields new CovarianceKoopmanModel.

fit(data, *args, **kw)

Fits a new CovarianceKoopmanModel which can be obtained by a subsequent call to fetch_model().

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

fit_from_covariances(covariances)

Fits a model based on provided symmetrized covariances.

fit_from_timeseries(data[, weights])

Estimates a CovarianceKoopmanModel directly from time-series data using the Covariance estimator.

fit_transform(data[, fit_options, ...])

Fits a model which simultaneously functions as transformer and subsequently transforms the input data.

get_params([deep])

Get the parameters.

partial_fit(data)

Updates the covariance estimates through a new batch of data.

set_params(**params)

Set the parameters of this estimator.

transform(data[, propagate])

Projects given timeseries onto dominant singular functions.

__call__(*args, **kwargs)

Call self as a function.

classmethod covariance_estimator(lagtime: int, ncov: int = inf)

Yields a properly configured covariance estimator so that its model can be used as input for the vamp estimator.

Parameters:
  • lagtime (int) – Positive integer denoting the time shift which is considered for autocorrelations.

  • ncov (int or float('inf'), optional, default=float('inf')) – Limit the memory usage of the algorithm from [7] to an amount that corresponds to ncov additional copies of each correlation matrix.

Returns:

estimator – Covariance estimator.

Return type:

Covariance

fetch_model() CovarianceKoopmanModel

Finalizes current model and yields new CovarianceKoopmanModel.

Returns:

model – The estimated model.

Return type:

CovarianceKoopmanModel

fit(data, *args, **kw)

Fits a new CovarianceKoopmanModel which can be obtained by a subsequent call to fetch_model().

Parameters:
  • data (CovarianceModel or Covariance or timeseries) – Covariance matrices \(C_{00}, C_{0t}, C_{tt}\) in form of a CovarianceModel instance. If the model should be fitted directly from data, please see from_data(). Optionally, this can also be timeseries data directly, in which case a ‘lagtime’ must be provided.

  • *args – Optional arguments

  • **kw – Ignored keyword arguments for scikit-learn compatibility.

Returns:

self – Reference to self.

Return type:

VAMP

Notes

If you are running into memory problems for potentially multiple trajectories you can decrease memory load by using partial_fit() on individual trajectories or in conjunction with timeshifted_split:

>>> import numpy as np
>>> from deeptime.decomposition import TICA
>>> from deeptime.util.data import timeshifted_split
>>> estimator = TICA(dim=1)
>>> for X, Y in timeshifted_split(np.ones(shape=(100, 5)), lagtime=1, chunksize=40):  
...     estimator.partial_fit((X, Y))
>>> joint_model = estimator.fetch_model()
fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

Parameters:
  • data (array_like) – Data that is used to fit the model.

  • **kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

fit_from_covariances(covariances: Union[Covariance, CovarianceModel])

Fits a model based on provided symmetrized covariances.

Parameters:

covariances (Covariance or CovarianceModel) – The covariances

Returns:

self – Reference to self.

Return type:

TICA

fit_from_timeseries(data, weights=None)

Estimates a CovarianceKoopmanModel directly from time-series data using the Covariance estimator. For parameters dim, scaling, epsilon.

Parameters:
Returns:

self – Reference to self.

Return type:

VAMP

fit_transform(data, fit_options=None, transform_options=None)

Fits a model which simultaneously functions as transformer and subsequently transforms the input data. The estimated model can be accessed by calling fetch_model().

Parameters:
  • data (array_like) – The input data.

  • fit_options (dict, optional, default=None) – Optional keyword arguments passed on to the fit method.

  • transform_options (dict, optional, default=None) – Optional keyword arguments passed on to the transform method.

Returns:

output – Transformed data.

Return type:

array_like

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

partial_fit(data)

Updates the covariance estimates through a new batch of data.

Parameters:

data (tuple(ndarray, ndarray)) – A tuple of ndarrays which have to have same shape and are \(X_t\) and \(X_{t+\tau}\), respectively. Here, \(\tau\) denotes the lagtime.

Returns:

self – Reference to self.

Return type:

VAMP

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

transform(data, propagate=False)

Projects given timeseries onto dominant singular functions. This method dispatches to CovarianceKoopmanModel.transform().

Parameters:
  • data ((T, n) ndarray) – Input timeseries data.

  • propagate (bool, default=False) – Whether to apply the Koopman operator after data was transformed into the whitened feature space.

Returns:

Y – The projected data. If right is True, projection will be on the right singular functions. Otherwise, projection will be on the left singular functions.

Return type:

(T, m) ndarray

property dim: Optional[int]

Dimension attribute. Can either be int or float. In case of

  • int it evaluates it as the actual dimension, must be strictly greater 0,

  • None all numerically available components are used.

Getter:

yields the dimension

Setter:

sets a new dimension

Type:

int or None

property epsilon: float

Eigenvalue norm cutoff.

Type:

float

property has_model: bool

Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.

Type:

bool

property lagtime: Optional[int]

The lagtime under which covariances are estimated. Can be None in case covariances are provided directly instead of estimating them inside this estimator.

Getter:

Yields the current lagtime.

Setter:

Sets a new lagtime, must be positive.

Type:

int or None

property model

Shortcut to fetch_model().

property scaling: Optional[str]

Scaling parameter. Can take the following values:

  • None: unscaled.

  • ‘kinetic_map’: Eigenvectors will be scaled by eigenvalues. As a result, Euclidean distances in the transformed data approximate kinetic distances [2]. This is a good choice when the data is further processed by clustering.

  • ‘commute_map’: Eigenvector i will be scaled by sqrt(timescale_i / 2). As a result, Euclidean distances in the transformed data will approximate commute distances [3].

Getter:

Yields the currently configured scaling.

Setter:

Sets a new scaling.

Type:

str or None

property var_cutoff: Optional[float]

Variational cutoff which can be used to further restrict the dimension. This takes precedence over the dim() property.

Getter:

yields the currently set variation cutoff

Setter:

sets a new cutoff

Type:

float or None