The Silhouette Score for Cluster Quality
The Silhouette Score measures how similar each sample is to its own cluster compared to neighboring clusters, providing an objective measure of clustering quality from -1 (wrong cluster) to +1 (perfect cluster).
Silhouette Score Formula
For sample i: s(i) = (b(i) - a(i)) / max(a(i), b(i)), where a(i) is the mean intra-cluster distance and b(i) is the mean distance to the nearest other cluster.
Interpreting the Score
- s \u2248 1: Sample is well inside its cluster and far from neighboring clusters.
- s \u2248 0: Sample is near a cluster boundary.
- s \u2248 -1: Sample may be assigned to the wrong cluster.
The overall Silhouette Score is the mean over all samples.
Computing Silhouette Score in sklearn
Use sklearn.metrics.silhouette_score for the global average or silhouette_samples for per-sample scores.
Selecting K with Silhouette
Silhouette Diagrams
Silhouette vs. Inertia
Unlike inertia, the Silhouette Score penalizes clusters that are not well-separated, making it a more holistic quality measure. It works across different clustering algorithms, not just K-Means.
Algorithm-Agnostic Application
The Silhouette Score can be applied to any clustering algorithm that produces cluster labels — including DBSCAN, hierarchical clustering, and GMMs — making it the most versatile cluster quality metric in scikit-learn.