class RegularSpace

class deeptime.clustering.RegularSpace(dmin: float, max_centers: int = 1000, metric: str = 'euclidean', n_jobs=None)

Clusters data objects in such a way, that cluster centers are at least in distance of dmin to each other according to the given metric. The assignment of data objects to cluster centers is performed by Voronoi partioning.

Regular space clustering [1] is very similar to Hartigan’s leader algorithm [2]. It consists of two passes through the data. Initially, the first data point is added to the list of centers. For every subsequent data point, if it has a greater distance than dmin from every center, it also becomes a center. In the second pass, a Voronoi discretization with the computed centers is used to partition the data.

Parameters:
  • dmin (float) – Minimum distance between all clusters, must be non-negative.

  • max_centers (int) – If this threshold is met during finding the centers, the algorithm will terminate. Must be positive.

  • metric (str, default='euclidean') – The metric to use during clustering. For a list of available metrics, see the metric registry.

  • n_jobs (int, optional, default=None) – Number of threads to use during estimation.

References

Attributes

dmin

Minimum distance between cluster centers.

has_model

Property reporting whether this estimator contains an estimated model.

max_centers

Cutoff during clustering.

metric

The metric that is used for clustering.

model

Shortcut to fetch_model().

n_clusters

Alias to max_centers.

n_jobs

The number of threads to use during estimation.

Methods

fetch_model()

Fetches the current model.

fit(data[, n_jobs])

Fits this estimator onto data.

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

get_params([deep])

Get the parameters.

partial_fit(data[, n_jobs])

Fits data to an existing model.

set_params(**params)

Set the parameters of this estimator.

fetch_model() ClusterModel

Fetches the current model. Can be None in case fit() was not called yet.

Returns:

model – The latest estimated model or None.

Return type:

ClusterModel or None

fit(data, n_jobs=None)

Fits this estimator onto data. The estimation is carried out by

  1. Choosing first data frame as centroid

  2. for all frames \(x\in X\): Calculate distance to all cluster centers

  3. Add a new centroid if minimal distance to all other cluster centers is larger or equal dmin.

Parameters:
  • data ((T, n) ndarray or list of ndarray) – the data to fit

  • n_jobs (int, optional, default=None) – Number of jobs, superseeds n_jobs if set to an integer value

Returns:

self – reference to self

Return type:

RegularSpace

fit_fetch(data, **kwargs)

Fits the internal model on data and subsequently fetches it in one call.

Parameters:
  • data (array_like) – Data that is used to fit the model.

  • **kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

get_params(deep=False)

Get the parameters.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

partial_fit(data, n_jobs=None)

Fits data to an existing model. See fit().

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

property dmin: float

Minimum distance between cluster centers.

Getter:

Yields the currently set minimum distance.

Setter:

Sets a new minimum distance, must be non-negative.

Type:

float

property has_model: bool

Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.

Type:

bool

property max_centers: int

Cutoff during clustering. If reached no more data is taken into account. You might then consider a larger value or a larger dmin value.

Getter:

Current maximum number of cluster centers.

Setter:

Sets a new maximum number of cluster centers, must be non-negative.

Type:

int

property metric: str

The metric that is used for clustering.

Type:

str.

property model

Shortcut to fetch_model().

property n_clusters: int

Alias to max_centers.

property n_jobs: int

The number of threads to use during estimation.

Getter:

Yields the number of threads to use, -1 is an allowed value for all available threads.

Setter:

Sets the number of threads to use, can be None in which case it defaults to 1 thread.

Type:

int