Dimension reduction¶
Here we introduce the dimension reduction / decomposition techniques implemented in the package.
Koopman operator methods
All methods contained in this sub-package relate to the Koopman operator \(\mathcal{K}_\tau\) defined as
for a process \(\{x_t\}_{t\geq 0}\) with transition density \(p_\tau(x, y)\). When projecting \(\mathcal{K}_\tau\) into a finite basis, one seeks
where \(K\in\mathbb{R}^{n\times m}\) is a finite-dimensional Koopman matrix which propagates the observable \(f\) of the system’s state \(x_t\) to the observable \(g\) at state \(x_{t+\tau}\).
When to use which method
All methods assume (approximate) Markovianity of the time series under lag-time \(\tau\).
Method |
Assumptions |
Notes |
---|---|---|
Time series should be stationary with symmetric covariances (equivalently: reversible with detailed balance) and compact Koopman operator (guaranteed to be compact in stochastic systems). |
|
|
Compact Koopman operator (guaranteed to be compact in stochastic systems) |
|
|
Compact Koopman operator (guaranteed to be compact in stochastic systems) |
|
|
|
||
|
||
It should be noted that, if available, scores evaluated under different lagtimes are not comparable because they relate to different operators.
What’s next?
While a dimensionality reduction is always of great use because it makes it easier to look at the data, one can take further steps.
A commonly performed pipeline would be to cluster the projected data and then building a markov state model on the resulting discretized state space.
Estimating covariances and how to deal with large amounts of data
While the implementations of TICA and its generalization VAMP can be fit directly by a time series that is kept in the computer’s memory, this might not always be possible.
The implementations are based on estimating covariance matrices, by default using the covariance estimator.
This estimator makes use of an online algorithm, so that it can be fit in a streaming fashion:
estimator = deeptime.decomposition.TICA(lagtime=tau) # creating an estimator
estimator = deeptime.decomposition.VAMP(lagtime=tau) # either TICA or VAMP
Since toy data usually easily fits into memory, loading data from, e.g., a database or network is simulated with the timeshifted_split() utility function. It splits the data into timeshifted blocks \(X_t\) and \(X_{t+\tau}\).
These blocks are not trajectory-overlapping, i.e., if two or more trajectories are provided then the blocks are always completely contained in exactly one of these.
Note how here we provide both blocks, the block \(X_t\) and the block \(X_{t+\tau}\) as a tuple.
This is different to fit()
where the splitting and shifting is performed internally; in which case it
suffices to provide the whole dataset as argument.
for X, Y in deeptime.data.timeshifted_split(feature_trajectory, lagtime=tau, chunksize=100):
estimator.partial_fit((X, Y))
Furthermore, the online algorithm uses a tree-like moment storage with copies of intermediate covariance and mean estimates. During the learning procedure, these moment storages are combined so that the tree never exceeds a certain depth. This depth can be set by the ncov estimator parameter:
estimator = deeptime.decomposition.TICA(lagtime=1, ncov=50)
for X, Y in deeptime.data.timeshifted_split(feature_trajectory, lagtime=1, chunksize=10):
tica.partial_fit((X, Y))
Another factor to consider is numerical stability. While memory consumption can increase with larger ncov, the stability generally improves.
References