class MiniBatchKMeans

class deeptime.clustering.MiniBatchKMeans(n_clusters, batch_size=100, max_iter=5, metric='euclidean', tolerance=1e-05, init_strategy='kmeans++', n_jobs=None, initial_centers=None)

K-means clustering in a mini-batched fashion.

Parameters:

batch_size (int, optional, default=100) – The maximum sample size if calling fit().

See also

KMeans

Superclass, see for description of remaining parameters.

KMeansModel

Attributes

fixed_seed

seed for random choice of initial cluster centers.

has_model

Property reporting whether this estimator contains an estimated model.

init_strategy

Strategy to get an initial guess for the centers.

initial_centers

Yields initial centers which override the init_strategy().

max_iter

Maximum number of clustering iterations before stop.

metric

The metric that is used for clustering.

model

Shortcut to fetch_model().

n_clusters

The number of cluster centers to use.

n_jobs

Number of threads to use during clustering and assignment of data.

tolerance

Stopping criterion for the k-means iteration.

Methods

fetch_model()

Fetches the current model.

fit(data[, initial_centers, ...])

Perform clustering on whole data.

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

fit_transform(data[, fit_options, ...])

Fits a model which simultaneously functions as transformer and subsequently transforms the input data.

get_params([deep])

Get the parameters.

partial_fit(data[, n_jobs])

Updates the current model (or creates a new one) with data.

set_params(**params)

Set the parameters of this estimator.

transform(data, **kw)

Transforms a trajectory to a discrete trajectory by assigning each frame to its respective cluster center.

__call__(*args, **kwargs)

Call self as a function.

fetch_model() Optional[KMeansModel]

Fetches the current model. Can be None in case fit() was not called yet.

Returns:

model – the latest estimated model

Return type:

KMeansModel or None

fit(data, initial_centers=None, callback_init_centers=None, callback_loop=None, n_jobs=None)

Perform clustering on whole data.

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

Parameters:
  • data (array_like) – Data that is used to fit the model.

  • **kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

fit_transform(data, fit_options=None, transform_options=None)

Fits a model which simultaneously functions as transformer and subsequently transforms the input data. The estimated model can be accessed by calling fetch_model().

Parameters:
  • data (array_like) – The input data.

  • fit_options (dict, optional, default=None) – Optional keyword arguments passed on to the fit method.

  • transform_options (dict, optional, default=None) – Optional keyword arguments passed on to the transform method.

Returns:

output – Transformed data.

Return type:

array_like

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

partial_fit(data, n_jobs=None)

Updates the current model (or creates a new one) with data. This method can be called repeatedly and thus be used to train a model in an on-line fashion. Note that usually multiple passes over the data is used. Also this method should not be mixed with calls to fit(), as then the model is overwritten with a new instance based on the data passed to fit().

Parameters:
  • data ((T, n) ndarray) – Data with which the model is updated and/or initialized.

  • n_jobs (int, optional, default=None) – number of jobs to use when updating the model, supersedes the n_jobs attribute of the estimator.

Returns:

self – reference to self

Return type:

MiniBatchKMeans

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

transform(data, **kw) ndarray

Transforms a trajectory to a discrete trajectory by assigning each frame to its respective cluster center.

Parameters:
  • data ((T, n) ndarray) – trajectory with T frames and data points in n dimensions.

  • **kw – ignored kwargs for scikit-learn compatibility

Returns:

discrete_trajectory – discrete trajectory

Return type:

(T, 1) ndarray

See also

ClusterModel.transform

transform method of cluster model, implicitly called.

property fixed_seed

seed for random choice of initial cluster centers.

Fix this to get reproducible results in conjunction with n_jobs=0. The latter is needed, because parallel execution causes non-deterministic behaviour again.

property has_model: bool

Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.

Type:

bool

property init_strategy

Strategy to get an initial guess for the centers.

Getter:

Yields the strategy, can be one of “kmeans++” or “uniform”.

Setter:

Setter for the initialization strategy that is used when no initial centers are provided.

Type:

string

property initial_centers: Optional[ndarray]

Yields initial centers which override the init_strategy(). Can be used to resume k-means iterations.

Getter:

The initial centers or None.

Setter:

Sets the initial centers. If not None, the array is expected to have length n_clusters.

Type:

(k, n) ndarray or None

property max_iter: int

Maximum number of clustering iterations before stop.

Getter:

Yields the maximum number of clustering iterations

Setter:

Sets the max. number of clustering iterations

Type:

int

property metric: str

The metric that is used for clustering.

See also

_clustering_bindings.Metric

The metric class, can be subclassed

metrics

Metrics registry which maps from metric label to actual implementation

property model

Shortcut to fetch_model().

property n_clusters: int

The number of cluster centers to use.

Getter:

Yields the number of cluster centers.

Setter:

Sets the number of cluster centers.

Type:

int

property n_jobs: int

Number of threads to use during clustering and assignment of data.

Getter:

Yields the number of threads. If -1, all available threads are used.

Setter:

Sets the number of threads to use. If -1, use all, if None, use 1.

Type:

int

property tolerance: float

Stopping criterion for the k-means iteration. When the relative change of the cost function between two iterations is less than the tolerance, the algorithm is considered to be converged.

Getter:

Yields the currently set tolerance.

Setter:

Sets a new tolerance.

Type:

float