Text Generation with Character-Level RNNs

Character-level language models generate text by predicting the next character in a sequence. By working at the character level, these networks maintain a small vocabulary size and can generate novel words, though they require longer sequences to capture context.

The Character-Level Language Model

A character language model treats text as a sequence of character tokens. Given a history of characters, the network predicts a probability distribution over the entire character vocabulary.

Vocabulary and Tokenization

The character vocabulary consists of all unique letters, numbers, punctuation, and whitespaces in the corpus (typically ~100 characters). This is much smaller than word vocabularies (which can exceed 50,000 tokens), preventing memory bloat.

Training Objective

The model is trained using Cross-Entropy loss. At each step, the network takes a character and attempts to predict the actual next character in the text document, maximizing the likelihood of the training text.

Sampling and Softmax Temperature

During text generation, the model predicts logits at each step. We control the creativity of the generated text by applying a temperature scale to these logits before sampling.

Temperature Scaling Math

To adjust creativity, we divide logits by a temperature value T: p_i = \\frac{\\exp(z_i / T)}{\\sum \\exp(z_j / T)}. If T \\to 0, the distribution becomes argmax (greedy, highly repetitive). If T \\to \\infty, the distribution becomes uniform (random, chaotic).

PyTorch Sampling Implementation

We use torch.multinomial to sample characters from the adjusted probability distribution.

<pre><code class="language-python">import torch import torch.nn.functional as F # Simulated logits over a character vocabulary of size 80 logits = torch.tensor([2.0, -1.0, 0.5, 4.0] + [-5.0] * 76) # Index 3 is most likely def sample_with_temp(logits, temp=0.7): # Scale logits by temperature scaled_logits = logits / temp probs = F.softmax(scaled_logits, dim=-1) # Sample index return torch.multinomial(probs, num_samples=1).item() print(sample_with_temp(logits, temp=0.1)) # Almost always returns 3 print(sample_with_temp(logits, temp=2.0)) # Shows higher variability</pre>