class RegularSpace¶

class deeptime.clustering.RegularSpace(dmin: float, max_centers: int = 1000, metric: str = 'euclidean', n_jobs=None)¶

Clusters data objects in such a way, that cluster centers are at least in distance of dmin to each other according to the given metric. The assignment of data objects to cluster centers is performed by Voronoi partioning.

Regular space clustering [1] is very similar to Hartigan’s leader algorithm [2]. It consists of two passes through the data. Initially, the first data point is added to the list of centers. For every subsequent data point, if it has a greater distance than dmin from every center, it also becomes a center. In the second pass, a Voronoi discretization with the computed centers is used to partition the data.

Parameters:

dmin (float) – Minimum distance between all clusters, must be non-negative.
max_centers (int) – If this threshold is met during finding the centers, the algorithm will terminate. Must be positive.
metric (str, default='euclidean') – The metric to use during clustering. For a list of available metrics, see the metric registry.
n_jobs (int, optional, default=None) – Number of threads to use during estimation.

References

Attributes

`dmin`	Minimum distance between cluster centers.
`has_model`	Property reporting whether this estimator contains an estimated model.
`max_centers`	Cutoff during clustering.
`metric`	The metric that is used for clustering.
`model`	Shortcut to `fetch_model()`.
`n_clusters`	Alias to `max_centers`.
`n_jobs`	The number of threads to use during estimation.

Methods

`fetch_model`()	Fetches the current model.
`fit`(data[, n_jobs])	Fits this estimator onto data.
`fit_fetch`(data, **kwargs)	Fits the internal model on data and subsequently fetches it in one call.
`get_params`([deep])	Get the parameters.
`partial_fit`(data[, n_jobs])	Fits data to an existing model.
`set_params`(**params)	Set the parameters of this estimator.

fetch_model() → ClusterModel¶

Fetches the current model. Can be None in case fit() was not called yet.

Returns:: model – The latest estimated model or None.
Return type:: ClusterModel or None

fit(data, n_jobs=None)¶

Fits this estimator onto data. The estimation is carried out by

Choosing first data frame as centroid
for all frames \(x\in X\): Calculate distance to all cluster centers
Add a new centroid if minimal distance to all other cluster centers is larger or equal dmin.

Parameters:

data ((T, n) ndarray or list of ndarray) – the data to fit
n_jobs (int, optional, default=None) – Number of jobs, superseeds n_jobs if set to an integer value

Returns:

self – reference to self

Return type:

RegularSpace

fit_fetch(data, **kwargs)¶

Fits the internal model on data and subsequently fetches it in one call.

Parameters:

data (array_like) – Data that is used to fit the model.
**kwargs – Additional arguments to fit().

Returns:

The estimated model.

Return type:

model

get_params(deep=False)¶

Get the parameters.

Returns:: params – Parameter names mapped to their values.
Return type:: mapping of string to any

partial_fit(data, n_jobs=None)¶: Fits data to an existing model. See fit().

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: object

property dmin: float¶

Minimum distance between cluster centers.

Getter:: Yields the currently set minimum distance.
Setter:: Sets a new minimum distance, must be non-negative.
Type:: float

property has_model: bool¶

Property reporting whether this estimator contains an estimated model. This assumes that the model is initialized with None otherwise.

Type:: bool

property max_centers: int¶

Cutoff during clustering. If reached no more data is taken into account. You might then consider a larger value or a larger dmin value.

Getter:: Current maximum number of cluster centers.
Setter:: Sets a new maximum number of cluster centers, must be non-negative.
Type:: int

property metric: str¶

The metric that is used for clustering.

Type:: str.

property model¶: Shortcut to fetch_model().

property n_clusters: int¶: Alias to max_centers.

property n_jobs: int¶

The number of threads to use during estimation.

Getter:: Yields the number of threads to use, -1 is an allowed value for all available threads.
Setter:: Sets the number of threads to use, can be None in which case it defaults to 1 thread.
Type:: int