Unsupervised Learning Paradigms

Unsupervised learning discovers hidden structure in unlabeled data — no target variable guides the algorithm, making it both powerful and challenging to evaluate.

Core Paradigms

Unsupervised methods can be grouped into four broad categories, each addressing a different type of structure discovery.

Overview of Paradigms

Clustering: Groups similar samples together (K-Means, DBSCAN, Hierarchical).
Dimensionality Reduction: Projects high-dimensional data to fewer dimensions while preserving structure (PCA, t-SNE, UMAP).
Density Estimation: Models the probability distribution of data (GMM, KDE).
Anomaly Detection: Identifies samples that don't conform to learned patterns (Isolation Forest, One-Class SVM).

Evaluation Without Labels

Without ground-truth labels, evaluating unsupervised models is inherently subjective. Internal metrics and domain knowledge guide model selection.

Internal Evaluation Metrics

Silhouette Score: Measures cluster compactness and separation (range: -1 to 1).
Davies-Bouldin Index: Lower is better; ratio of within-cluster to between-cluster distances.
Calinski-Harabasz Score: Ratio of between-cluster to within-cluster dispersion; higher is better.

External Evaluation (When Labels Exist)

<pre><code class="language-python">from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score from sklearn.cluster import KMeans from sklearn.datasets import load_iris X, y_true = load_iris(return_X_y=True) km = KMeans(n_clusters=3, random_state=42, n_init=10) y_pred = km.fit_predict(X) print(f"ARI: {adjusted_rand_score(y_true, y_pred):.3f}") print(f"NMI: {normalized_mutual_info_score(y_true, y_pred):.3f}")</pre>

Unsupervised Learning Workflow

A typical unsupervised workflow involves preprocessing, algorithm selection, hyperparameter tuning via internal metrics, and result interpretation with domain expertise.

Preprocessing Matters

Most clustering and dimensionality reduction algorithms are sensitive to feature scale. Always standardize features (StandardScaler) before applying distance-based or variance-based methods. Consider PCA for initial dimensionality reduction before clustering on very high-dimensional data.