Word Embeddings: Dense Vector Representations

Word embeddings replace sparse one-hot vectors with compact, dense vectors of continuous values. By mapping words to a lower-dimensional space, embeddings allow neural networks to learn and utilize semantic relationships through geometric distances.


Dense Vector Spaces

Word embeddings project a vocabulary V onto a continuous vector space of dimension D (typically 100 to 768), where every coordinate contains a real-valued number.

Dimensionality Reduction

By reducing dimensions from 50,000 (one-hot) to 300 (embedding), we compress the vocabulary representation, saving memory and speeding up training while preventing the curse of dimensionality.

Semantic Geometry

Embeddings place semantically related words near each other in the vector space. The cosine similarity of 'cat' and 'kitten' will be close to 1, while 'cat' and 'refrigerator' will be close to 0, allowing the model to generalize patterns across similar words.

PyTorch Embedding Layer

PyTorch provides an nn.Embedding layer that acts as a lookup table that maps integer word indices to dense embedding vectors.

Using nn.Embedding

The weights of the embedding layer are initialized randomly and updated via backpropagation during training, learning task-specific word representations.

<pre><code class="language-python">import torch import torch.nn as nn # Initialize embedding table: vocabulary of size 10, embedding dimension of 5 embed = nn.Embedding(num_embeddings=10, embedding_dim=5) # Input: tensor of word indices word_indices = torch.tensor([1, 4, 9]) # Forward pass extracts the corresponding embedding vectors word_vectors = embed(word_indices) print(word_vectors.shape) # torch.Size([3, 5]) print(word_vectors) # Dense floating point values</pre>

Embedding Arithmetic

Because embeddings capture semantic relationships, the vectors exhibit arithmetic properties: \text{King} - \text{Man} + \text{Woman} \approx \text{Queen}, showing that relationships like gender are encoded as stable directions in the vector space.