Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) algorithm iterates between computing expected assignments (E-step) and maximizing parameters given those assignments (M-step) to fit models with latent variables.

The Two EM Steps

EM alternates between two steps until convergence: the E-step computes posterior probabilities of latent variables (soft cluster assignments), and the M-step updates model parameters to maximize the expected log-likelihood.

E-Step: Compute Responsibilities

For GMMs, the E-step computes the responsibility r_{ik} = \u03c0_k \u2115(x_i; \u03bc_k, \u03a3_k) / p(x_i) — the probability that component k generated sample i. This is a Bayes' theorem application using current parameter estimates.

M-Step: Update Parameters

Given responsibilities from the E-step, the M-step updates the parameters analytically: \u03bc_k = \u03a3_i r_{ik} x_i / N_k, \u03a3_k = \u03a3_i r_{ik} (x_i - \u03bc_k)(x_i - \u03bc_k)^T / N_k, and \u03c0_k = N_k / N, where N_k = \u03a3_i r_{ik}.

Convergence Properties

EM is guaranteed to monotonically increase the log-likelihood at every iteration, converging to a local maximum. It never decreases the objective, but convergence to the global maximum is not guaranteed.

Monitoring Convergence

<pre><code class="language-python">from sklearn.mixture import GaussianMixture from sklearn.datasets import make_blobs import numpy as np X, _ = make_blobs(n_samples=300, centers=3, random_state=42) gmm = GaussianMixture(n_components=3, n_init=1, max_iter=200, tol=1e-4, verbose=0, random_state=42) gmm.fit(X) print(f"Converged: {gmm.converged_}") print(f"Iterations to converge: {gmm.n_iter_}") print(f"Log-likelihood: {gmm.lower_bound_:.4f}")</pre>

Multiple Restarts

Since EM converges to local optima, running with n_init > 1 starts from multiple random initializations and returns the result with the highest log-likelihood. For GMMs, K-Means initialization (init_params='kmeans') often provides better starting points than random initialization.

EM Beyond GMMs

EM is a general-purpose algorithm for any model with latent variables: Hidden Markov Models (Baum-Welch algorithm), topic models (LDA), and missing data imputation all use EM principles.

EM vs. K-Means

K-Means can be seen as a special case of EM with hard assignments (each point belongs to exactly one cluster) and spherical, equal-variance Gaussians. GMM with EM generalizes this to soft assignments and arbitrary covariance structures, at the cost of more parameters and greater computational expense per iteration.