t-SNE (t-Distributed Stochastic Neighbor Embedding)

t-SNE is a nonlinear dimensionality reduction technique that preserves local neighborhood structure in low-dimensional visualizations, revealing clusters and patterns invisible in PCA projections.


How t-SNE Works

t-SNE models pairwise similarities in high dimensions as Gaussian probabilities and in low dimensions as Student-t probabilities, then minimizes the KL divergence between the two distributions via gradient descent.

The t-Distribution in Low Dimensions

The heavy tails of the Student-t distribution in the low-dimensional space correct the "crowding problem" — in high dimensions, moderate distances between dissimilar points compress into a small region. The t-distribution spreads these dissimilar points further apart, creating the characteristic cluster separation seen in t-SNE plots.

t-SNE with scikit-learn

scikit-learn's TSNE is easy to use but computationally expensive (O(N\u00b2)). For large datasets, use method='barnes_hut' (default, O(N log N)) or the openTSNE library.

Basic Usage

<pre><code class="language-python">from sklearn.manifold import TSNE from sklearn.datasets import load_digits from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt X, y = load_digits(return_X_y=True) X_scaled = StandardScaler().fit_transform(X) tsne = TSNE(n_components=2, perplexity=30, n_iter=1000, learning_rate='auto', init='pca', random_state=42) X_2d = tsne.fit_transform(X_scaled) plt.figure(figsize=(10, 7)) scatter = plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='tab10', alpha=0.6, s=10) plt.colorbar(scatter, label='Digit') plt.title('t-SNE of MNIST Digits') plt.show()</pre>

Perplexity Tuning

perplexity controls the effective number of neighbors (typically 5–50). Low perplexity emphasizes local structure; high perplexity reveals more global structure. Always try multiple perplexity values and run to convergence (n_iter \u2265 1000). t-SNE results are stochastic — set random_state for reproducibility.

Interpreting t-SNE Correctly

t-SNE is a visualization tool, not a general dimensionality reduction technique. Its output cannot be used for downstream ML tasks and distances between clusters are not meaningful.

Common Misinterpretations

  • Cluster sizes: Do not reflect actual cluster sizes in high dimensions.
  • Distances between clusters: Not meaningful; only within-cluster structure is preserved.
  • Global structure: t-SNE prioritizes local structure; use UMAP for better global structure preservation.
  • Reproducibility: Different random seeds produce different layouts; always set random_state.