Agglomerative vs. Divisive Clustering

Hierarchical clustering can proceed bottom-up (agglomerative) by merging individual points, or top-down (divisive) by recursively splitting the full dataset into smaller clusters.

Agglomerative Clustering (Bottom-Up)

Starting from N singleton clusters, agglomerative clustering greedily merges the two most similar clusters at each step until a single cluster remains. The merge sequence defines the dendrogram.

Advantages and Disadvantages

Pros: Computationally tractable (O(N\u00b3) naive, better with optimizations), widely supported in libraries, effective for small-to-medium datasets.
Cons: Early merges are irreversible; cannot undo poor decisions made at the start. Sensitive to noise and outliers depending on linkage.

Divisive Clustering (Top-Down)

Starting from one cluster containing all points, divisive clustering recursively splits clusters into two subclusters. DIANA (Divisive Analysis) is the classical algorithm.

Advantages and Disadvantages

Pros: Considers the global structure at each split; can produce better top-level clusters. Early splits can be more meaningful if the global structure is important.
Cons: O(2^N) naive complexity; much more computationally expensive. Less commonly implemented in mainstream ML libraries.

Bisecting K-Means: A Practical Divisive Approach

<pre><code class="language-python">from sklearn.cluster import BisectingKMeans from sklearn.datasets import make_blobs X, _ = make_blobs(n_samples=500, centers=5, random_state=42) # Divisive: repeatedly bisects the largest cluster bkm = BisectingKMeans(n_clusters=5, init='k-means++', bisecting_strategy='largest_cluster', random_state=42) labels = bkm.fit_predict(X) print(f"Cluster distribution: {[sum(labels==c) for c in range(5)]}")</pre>

Practical Choice

In practice, agglomerative clustering (especially Ward linkage) is preferred for most tasks due to library support and reasonable performance. Divisive methods are used when top-level cluster structure is the primary interest.

When to Use Each

Use agglomerative when you want a full hierarchy and have < 10,000 samples. Use divisive (Bisecting K-Means) when you have a large dataset and want scalable hierarchical clustering that avoids the O(N\u00b2) distance matrix. For very large datasets, mini-batch K-Means or BIRCH may be more appropriate than either.