GloVe (Global Vectors for Word Representation)

While Word2Vec relies on local sliding windows that ignore global statistics, GloVe (Global Vectors) addresses this limitation. Developed by Pennington et al. at Stanford, GloVe combines the local context window benefits of Word2Vec with global matrix factorization techniques.

Global Co-occurrence Matrix

GloVe constructs a global word co-occurrence matrix X from the entire corpus, where X_{ij} represents the number of times word i appears in the context of word j.

Ratios of Co-occurrence Probabilities

The key insight of GloVe is that the ratio of co-occurrence probabilities of words with set probes captures semantic meaning more cleanly than raw probabilities. For example, the ratio P(solid | ice) / P(solid | steam) is large, while P(gas | ice) / P(gas | steam) is small, indicating ice is solid and steam is gas.

The GloVe Objective Function

The GloVe model minimizes a weighted least-squares objective function that fits the dot product of word vectors to the log of their co-occurrence counts.

The Loss Equation

The loss function is defined as: J = \\sum_{i,j=1}^V f(X_{ij}) (w_i^T \\tilde{w}_j + b_i + \\tilde{b}_j - \\log X_{ij})^2. The weighting function f(X_{ij}) = \\min(1, (X_{ij}/x_{max})^\\alpha) prevents frequent co-occurrences (like 'the', 'and') from dominating the loss, while assigning zero weight to non-occurring pairs.

GloVe vs. Word2Vec

While Word2Vec is trained on online sliding windows (which requires many epochs over large corpora), GloVe is trained directly on precomputed global statistics. This makes GloVe computationally efficient to train on large vocabularies while capturing both global semantic structures and local syntax.