class KMeans¶

class deeptime.clustering.KMeans(n_clusters: int, max_iter: int = 500, metric='euclidean', tolerance=1e-05, init_strategy: str = 'kmeans++', fixed_seed=False, n_jobs=None, initial_centers=None, progress=None)¶

Clusters the data in a way that minimizes the cost function

C(S) = \sum_{i=1}^{k} \sum_{\mathbf{x}_j \in S_i} \left\| \mathbf{x}_j - \boldsymbol\mu_i \right\|^2

where $S_i$ are clusters with centers of mass $\mu_i$ and $\mathbf{x}_j$ data points associated to their clusters.

The outcome is very dependent on the initialization, in particular we offer “kmeans++” and “uniform”. The latter picks initial centers random-uniformly over the provided data set. The former tries to find an initialization which is covering the spatial configuration of the dataset more or less uniformly. For details see [1].

Parameters:

n_clusters (int) – amount of cluster centers.
max_iter (int, default=500) – maximum number of iterations before stopping.
metric (str, default='euclidean') – Metric to use during clustering, default evaluates to euclidean metric. For a list of available metrics, see the metric registry.
tolerance (float, default=1e-5) –
Stop iteration when the relative change in the cost function (inertia)

$C(S) = \sum_{i=1}^{k} \sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2$

is smaller than tolerance.
init_strategy (str, default='kmeans++') – one of ‘kmeans++’, ‘uniform’; determining how the initial cluster centers are being chosen
fixed_seed (bool or int, default=False) – if True, the seed gets set to 42. Use time based seeding otherwise. If an integer is given, use this to initialize the random generator.
n_jobs (int or None, default=None) – Number of threads to use during clustering and assignment of data. If None, one core will be used.
initial_centers (None or np.ndarray[k, dim], default=None) – This is used to resume the kmeans iteration. Note, that if this is set, the init_strategy is ignored and the centers are directly passed to the kmeans iteration algorithm.
progress (object) – Progress bar object that KMeans will call to indicate progress to the user. Tested for a tqdm progress bar. The interface is checked via supports_progress_interface.

References

See also

KMeansModel, MiniBatchKMeans

Attributes

`fixed_seed`	seed for random choice of initial cluster centers.
`has_model`	Property reporting whether this estimator contains an estimated model.
`init_strategy`	Strategy to get an initial guess for the centers.
`initial_centers`	Yields initial centers which override the `init_strategy()`.
`max_iter`	Maximum number of clustering iterations before stop.
`metric`	The metric that is used for clustering.
`model`	Shortcut to `fetch_model()`.
`n_clusters`	The number of cluster centers to use.
`n_jobs`	Number of threads to use during clustering and assignment of data.
`tolerance`	Stopping criterion for the k-means iteration.

Methods

`fetch_model`()	Fetches the current model.
`fit`(data[, initial_centers, ...])	Perform the clustering.
`fit_fetch`(data, **kwargs)	Fits the internal model on data and subsequently fetches it in one call.
`fit_transform`(data[, fit_options, ...])	Fits a model which simultaneously functions as transformer and subsequently transforms the input data.
`get_params`([deep])	Get the parameters.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(data, **kw)	Transforms a trajectory to a discrete trajectory by assigning each frame to its respective cluster center.

__call__(*args, **kwargs)¶: Call self as a function.

fetch_model() → KMeansModel | None¶

Fetches the current model. Can be None in case fit() was not called yet.

Returns:: model – the latest estimated model
Return type:: KMeansModel or None

fit(data, initial_centers=None, callback_init_centers=None, callback_loop=None, n_jobs=None)¶

Perform the clustering.

Parameters:

data (np.ndarray or list(np.ndarray)) – data to be clustered, shape of ndarray(s) should be (N, D), where N is the number of data points, D the dimension. In case of one-dimensional data, a shape of (N,) also works. If data is in shape of a list of ndarrays, the ndarrays will be concatenated to one ndarray.
initial_centers (np.ndarray or None) – Optional cluster center initialization that supersedes the estimator’s initial_centers attribute
callback_init_centers (function or None) – used for kmeans++ initialization to indicate progress, called once per assigned center.
callback_loop (function or None) – used to indicate progress on kmeans iterations, called once per iteration.
n_jobs (None or int) – if not None, supersedes the n_jobs attribute of the estimator instance; must be non-negative

Returns:

self – reference to self

Return type:

KMeans

fit_fetch(data, **kwargs)¶

Fits the internal model on data and subsequently fetches it in one call.

Parameters:

data (array_like) – Data that is used to fit the model.
**kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

fit_transform(data, fit_options=None, transform_options=None)¶

Fits a model which simultaneously functions as transformer and subsequently transforms the input data. The estimated model can be accessed by calling fetch_model().

Parameters:

data (array_like) – The input data.
fit_options (dict, optional, default=None) – Optional keyword arguments passed on to the fit method.
transform_options (dict, optional, default=None) – Optional keyword arguments passed on to the transform method.

Returns:

output – Transformed data.

Return type:

array_like

get_params(deep=False)¶

Get the parameters.

Returns:: params – Parameter names mapped to their values.
Return type:: mapping of string to any

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: object

transform(data, **kw) → ndarray¶

Transforms a trajectory to a discrete trajectory by assigning each frame to its respective cluster center.

Parameters:

data ((T, n) ndarray) – trajectory with T frames and data points in n dimensions.
**kw – ignored kwargs for scikit-learn compatibility

Returns:

discrete_trajectory – discrete trajectory

Return type:

(T, 1) ndarray