class TICA¶
- class deeptime.decomposition.TICA(lagtime: ~typing.Optional[int] = None, epsilon: float = 1e-06, dim: ~typing.Optional[int] = None, var_cutoff: ~typing.Optional[float] = None, scaling: ~typing.Optional[str] = 'kinetic_map', observable_transform: ~typing.Callable[[~numpy.ndarray], ~numpy.ndarray] = <deeptime.basis._monomials.Identity object>)¶
Time-lagged independent component analysis (TICA).
TICA is a linear transformation method. In contrast to PCA, which finds coordinates of maximal variance, TICA finds coordinates of maximal autocorrelation at the given lag time. Therefore, TICA is useful in order to find the slow components in a dataset and thus an excellent choice to transform molecular dynamics data before clustering data for the construction of a Markov model. When the input data is the result of a Markov process (such as thermostatted molecular dynamics), TICA finds in fact an approximation to the eigenfunctions and eigenvalues of the underlying Markov operator [1].
It estimates a TICA transformation from data. The resulting model can be used to obtain eigenvalues, eigenvectors, or project input data onto the slowest TICA components.
- Parameters:
lagtime (int or None, optional, default=None) – The lagtime under which covariances are estimated. This is only relevant when estimating from data, in case covariances are provided this should either be None or exactly the value that was used to estimate said covariances.
epsilon (float, optional, default=1e-6) – Eigenvalue norm cutoff. Eigenvalues of C0 with norms <= epsilon will be cut off. The remaining number of eigenvalues define the size of the output.
dim (int, optional, default=None) –
Number of dimensions to keep:
if dim is not set (None) all available ranks are kept:
n_components == min(n_samples, n_uncorrelated_features)
if dim is an integer >= 1, this number specifies the number of dimensions to keep.
var_cutoff (float, optional, default=None) – Determines the number of output dimensions by including dimensions until their cumulative kinetic variance exceeds the fraction subspace variance. var_cutoff=1.0 means all numerically available dimensions (see epsilon) will be used, unless set by dim. Setting var_cutoff smaller than 1.0 is exclusive with dim.
scaling (str or None, default='kinetic_map') – Can be set to
None
, ‘kinetic_map’ ([2]), or ‘commute_map’ ([3]). For more details seescaling
.observable_transform (callable, optional, default=Identity) – A feature transformation on the raw data which is used to estimate the model.
Notes
Given a sequence of multivariate data \(X_t\), it computes the mean-free covariance and time-lagged covariance matrix:
\[\begin{aligned} C_0 &= (X_t - \mu)^T \mathrm{diag}(w) (X_t - \mu) \\ C_{\tau} &= (X_t - \mu)^T \mathrm{diag}(w) (X_{t + \tau} - \mu) \end{aligned}\]where \(w\) is a vector of weights for each time step. By default, these weights are all equal to one, but different weights are possible, like the re-weighting to equilibrium described in [4]. Subsequently, the eigenvalue problem
\[C_{\tau} r_i = C_0 \lambda_i r_i, \]is solved,where \(r_i\) are the independent components and \(\lambda_i\) are their respective normalized time-autocorrelations. The eigenvalues are related to the relaxation timescale by
\[t_i = -\frac{\tau}{\ln |\lambda_i|}.\]When used as a dimension reduction method, the input data is projected onto the dominant independent components.
Under the assumption of reversible dynamics and the limit of good statistics, the time-lagged autocovariance \(C_\tau\) is symmetric. Due to finite data, this symmetry is explicitly enforced in the estimator.
TICA was originally introduced for signal processing in [5]. It was introduced to molecular dynamics and as a method for the construction of Markov models in [1] and [6]. It was shown in [1] that when applied to molecular dynamics data, TICA is an approximation to the eigenvalues and eigenvectors of the true underlying dynamics.
Examples
Invoke TICA transformation with a given lag time and output dimension:
>>> import numpy as np >>> from deeptime.decomposition import TICA >>> data = np.random.random((100,3)) >>> # fixed output dimension >>> estimator = TICA(dim=1, lagtime=2).fit(data) >>> model_onedim = estimator.fetch_model() >>> projected_data = model_onedim.transform(data) >>> np.testing.assert_equal(projected_data.shape[1], 1)
or invoke it with a percentage value of to-be captured kinetic variance (80% in the example)
>>> estimator = TICA(var_cutoff=0.8, lagtime=2).fit(data) >>> model_var = estimator.fetch_model() >>> projected_data = model_var.transform(data)
For a brief explaination why TICA outperforms PCA to extract a good reaction coordinate have a look here.
See also
CovarianceKoopmanModel
TICA estimation output model
References
Attributes
Dimension attribute.
Eigenvalue norm cutoff.
Property reporting whether this estimator contains an estimated model.
The lagtime under which covariances are estimated.
Shortcut to
fetch_model()
.Scaling parameter.
Variational cutoff which can be used to further restrict the dimension.
Methods
covariance_estimator
(lagtime[, ncov])Yields a properly configured covariance estimator so that its model can be used as input for the vamp estimator.
Finalizes current model and yields new
CovarianceKoopmanModel
.fit
(data, *args, **kw)Fits a new
CovarianceKoopmanModel
which can be obtained by a subsequent call tofetch_model()
.fit_fetch
(data, **kwargs)Fits the internal model on data and subsequently fetches it in one call.
fit_from_covariances
(covariances)Fits a model based on provided symmetrized covariances.
fit_from_timeseries
(data[, weights])Estimates a
CovarianceKoopmanModel
directly from time-series data using theCovariance
estimator.fit_transform
(data[, fit_options, ...])Fits a model which simultaneously functions as transformer and subsequently transforms the input data.
get_params
([deep])Get the parameters.
partial_fit
(data)Updates the covariance estimates through a new batch of data.
set_params
(**params)Set the parameters of this estimator.
transform
(data[, propagate])Projects given timeseries onto dominant singular functions.
- __call__(*args, **kwargs)¶
Call self as a function.
- classmethod covariance_estimator(lagtime: int, ncov: int = inf)¶
Yields a properly configured covariance estimator so that its model can be used as input for the vamp estimator.
- Parameters:
lagtime (int) – Positive integer denoting the time shift which is considered for autocorrelations.
ncov (int or float('inf'), optional, default=float('inf')) – Limit the memory usage of the algorithm from [7] to an amount that corresponds to ncov additional copies of each correlation matrix.
- Returns:
estimator – Covariance estimator.
- Return type:
- fetch_model() CovarianceKoopmanModel ¶
Finalizes current model and yields new
CovarianceKoopmanModel
.- Returns:
model – The estimated model.
- Return type:
- fit(data, *args, **kw)¶
Fits a new
CovarianceKoopmanModel
which can be obtained by a subsequent call tofetch_model()
.- Parameters:
data (CovarianceModel or Covariance or timeseries) – Covariance matrices \(C_{00}, C_{0t}, C_{tt}\) in form of a CovarianceModel instance. If the model should be fitted directly from data, please see
from_data()
. Optionally, this can also be timeseries data directly, in which case a ‘lagtime’ must be provided.*args – Optional arguments
**kw – Ignored keyword arguments for scikit-learn compatibility.
- Returns:
self – Reference to self.
- Return type:
Notes
If you are running into memory problems for potentially multiple trajectories you can decrease memory load by using
partial_fit()
on individual trajectories or in conjunction withtimeshifted_split
:>>> import numpy as np >>> from deeptime.decomposition import TICA >>> from deeptime.util.data import timeshifted_split >>> estimator = TICA(dim=1) >>> for X, Y in timeshifted_split(np.ones(shape=(100, 5)), lagtime=1, chunksize=40): ... estimator.partial_fit((X, Y)) >>> joint_model = estimator.fetch_model()
- fit_fetch(data, **kwargs)¶
Fits the internal model on data and subsequently fetches it in one call.
- Parameters:
data (array_like) – Data that is used to fit the model.
**kwargs – Additional arguments to
fit()
.
- Returns:
The estimated model.
- Return type:
model
- fit_from_covariances(covariances: Union[Covariance, CovarianceModel])¶
Fits a model based on provided symmetrized covariances.
- Parameters:
covariances (Covariance or CovarianceModel) – The covariances
- Returns:
self – Reference to self.
- Return type:
- fit_from_timeseries(data, weights=None)¶
Estimates a
CovarianceKoopmanModel
directly from time-series data using theCovariance
estimator. For parameters dim, scaling, epsilon.- Parameters:
data – Input data, see
to_dataset
for options.weights – See the
Covariance
estimator.
- Returns:
self – Reference to self.
- Return type:
- fit_transform(data, fit_options=None, transform_options=None)¶
Fits a model which simultaneously functions as transformer and subsequently transforms the input data. The estimated model can be accessed by calling
fetch_model()
.- Parameters:
data (array_like) – The input data.
fit_options (dict, optional, default=None) – Optional keyword arguments passed on to the fit method.
transform_options (dict, optional, default=None) – Optional keyword arguments passed on to the transform method.
- Returns:
output – Transformed data.
- Return type:
array_like
- get_params(deep=False)¶
Get the parameters.
- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- partial_fit(data)¶
Updates the covariance estimates through a new batch of data.
- Parameters:
data (tuple(ndarray, ndarray)) – A tuple of ndarrays which have to have same shape and are \(X_t\) and \(X_{t+\tau}\), respectively. Here, \(\tau\) denotes the lagtime.
- Returns:
self – Reference to self.
- Return type:
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
object
- transform(data, propagate=False)¶
Projects given timeseries onto dominant singular functions. This method dispatches to
CovarianceKoopmanModel.transform()
.- Parameters:
data ((T, n) ndarray) – Input timeseries data.
propagate (bool, default=False) – Whether to apply the Koopman operator after data was transformed into the whitened feature space.
- Returns:
Y – The projected data. If right is True, projection will be on the right singular functions. Otherwise, projection will be on the left singular functions.
- Return type:
(T, m) ndarray
- property dim: Optional[int]¶
Dimension attribute. Can either be int or float. In case of
int
it evaluates it as the actual dimension, must be strictly greater 0,None
all numerically available components are used.
- Getter:
yields the dimension
- Setter:
sets a new dimension
- Type:
int or None
- property epsilon: float¶
Eigenvalue norm cutoff.
- Type:
float
- property has_model: bool¶
Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.
- Type:
bool
- property lagtime: Optional[int]¶
The lagtime under which covariances are estimated. Can be None in case covariances are provided directly instead of estimating them inside this estimator.
- Getter:
Yields the current lagtime.
- Setter:
Sets a new lagtime, must be positive.
- Type:
int or None
- property model¶
Shortcut to
fetch_model()
.
- property scaling: Optional[str]¶
Scaling parameter. Can take the following values:
None: unscaled.
‘kinetic_map’: Eigenvectors will be scaled by eigenvalues. As a result, Euclidean distances in the transformed data approximate kinetic distances [2]. This is a good choice when the data is further processed by clustering.
‘commute_map’: Eigenvector i will be scaled by sqrt(timescale_i / 2). As a result, Euclidean distances in the transformed data will approximate commute distances [3].
- Getter:
Yields the currently configured scaling.
- Setter:
Sets a new scaling.
- Type:
str or None