DBSCAN Epsilon and MinPts Parameters
The two critical DBSCAN parameters — eps (neighborhood radius) and min_samples (minimum points for a core) — must be tuned together to match the data's density structure.
Role of Each Parameter
eps defines the neighborhood radius for density estimation. min_samples sets the density threshold — points in denser regions than this threshold become core points.
Effect of eps
Too small: most points are noise. Too large: clusters merge. The goal is an eps that captures the local density of genuine clusters. A common rule of thumb is to start with min_samples = 2 * n_features, then tune eps using the k-distance plot.
Effect of min_samples
Higher min_samples requires denser regions to form clusters, suppressing small or weak clusters. Lower values create more (potentially noisy) clusters. For 2D data, min_samples=5 is a good start; scale proportionally with dimensionality.
K-Distance Plot for Tuning eps
The k-distance plot sorts each point by its distance to its k-th nearest neighbor. A sharp knee in the plot suggests a good eps value — below the knee is noise, above is part of a cluster.
Generating the K-Distance Plot
Validating Parameter Choices
After setting parameters, validate with the silhouette score (excluding noise points) and by inspecting the ratio of noise to clustered points.