function vamp_score_cv

deeptime.decomposition.vamp_score_cv(fit_fetch: Union[Estimator, Callable], trajs, blocksize: Optional[int] = None, n=10, splitting_mode='sliding', r=2, dim: Optional[int] = None, blocksplit: bool = True, random_state=None, n_jobs=1, lagtime=None)

Scores the MSM using the variational approach for Markov processes and cross-validation.

Implementation and ideas following [1] [2] and cross-validation [3].

Divides the data into training and test data, fits a MSM using the training data using the parameters of this estimator, and scores is using the test data. Currently only one way of splitting is implemented, where for each n, the data is randomly divided into two approximately equally large sets of discrete trajectory fragments with lengths of at least the lagtime.

Currently only implemented using dense matrices - will be slow for large state spaces.

Parameters:
  • fit_fetch (callable or estimator) – Can be provided as callable for a custom fit and fetch method. Should be a function pointer or lambda which takes a list of trajectories as input and yields a CovarianceKoopmanModel. Or an estimator which yields this kind of model.

  • trajs (list of array_like) – Input data.

  • blocksize (int, optional, default=None) – lagtime must be provided if blocksplitting is used, otherwise can be left None. Specifies the minimum length of temporally consecutive blocks to split the data into.

  • splitting_mode (str, optional, default="sliding") – Can be one of “sliding” and “sample”. In former case the blocks may overlap, otherwise not.

  • n (number of samples) – Number of repetitions of the cross-validation. Use large n to get solid means of the score.

  • r (float or str, default=2) – Available scores are based on the variational approach for Markov processes [1] [2], see deeptime.decomposition.vamp_score() for available options.

  • blocksplit (bool, optional, default=True) – Whether to perform blocksplitting (see blocksplit_dtrajs() ) before evaluating folds. Defaults to True. In case no blocksplitting is performed, individual dtrajs are used for training and validation. This means that at least two dtrajs must be provided (len(dtrajs) >= 2), otherwise this method raises an exception.

  • dim (int or None, optional, default=None) – The maximum number of eigenvalues or singular values used in the score. If set to None, all available eigenvalues will be used.

  • random_state (None or int or np.random.RandomState) – Random seed to use.

  • n_jobs (int, optional, default=1) – Number of jobs for folds. In case n_jobs is 1, no parallelization.

  • lagtime (int, optional, default=None) –

    Same as blocksize.

    Deprecated since version 0.4.0: Use blocksize instead. Will be removed in 0.5.0

References