Mean Shift Clustering

Mean Shift is a non-parametric clustering algorithm that iteratively shifts each point toward the densest region in its neighborhood, converging to cluster modes without requiring K to be specified.

The Mean Shift Procedure

Each point is shifted toward the mean of its neighbors within a window of radius (bandwidth). This process repeats until convergence, with points converging to the same mode forming a cluster.

Algorithm Steps

For each point x_i, define a kernel window of radius h.
Compute the weighted mean of all points within the window.
Shift x_i to this mean.
Repeat until convergence (change < threshold).
Points with the same convergence mode are assigned to the same cluster.

Mean Shift in sklearn

scikit-learn provides MeanShift with automatic bandwidth estimation via estimate_bandwidth.

Training and Predicting

<pre><code class="language-python">from sklearn.cluster import MeanShift, estimate_bandwidth from sklearn.datasets import make_blobs import matplotlib.pyplot as plt X, _ = make_blobs(n_samples=300, centers=3, cluster_std=0.8, random_state=42) bandwidth = estimate_bandwidth(X, quantile=0.2, n_samples=200) print(f"Estimated bandwidth: {bandwidth:.3f}") ms = MeanShift(bandwidth=bandwidth, bin_seeding=True) ms.fit(X) labels = ms.labels_ centers = ms.cluster_centers_ print(f"Number of clusters found: {len(centers)}") plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='tab10', alpha=0.6) plt.scatter(centers[:, 0], centers[:, 1], marker='X', s=200, c='red') plt.title('Mean Shift Clustering'); plt.show()</pre>

Bandwidth Selection

estimate_bandwidth uses a quantile of pairwise distances to estimate the appropriate window size. Smaller quantile → smaller bandwidth → more clusters. Tune the quantile (0.1 to 0.5) as the main hyperparameter.

Strengths and Limitations

Mean Shift automatically determines cluster count, handles arbitrary cluster shapes, and is robust to outliers. Its main drawbacks are O(N\u00b2) complexity and sensitivity to bandwidth choice.

When to Use Mean Shift

Mean Shift is ideal for image segmentation and applications where the data density naturally defines clusters. For large datasets (>10K points), it becomes slow unless approximations like bin_seeding=True are used. DBSCAN or HDBSCAN are faster alternatives for large-scale density-based clustering.