Hidden States and Temporal Memory

The hidden state of a Recurrent Neural Network serves as its internal memory, compressing past sequence information into a single vector. However, as the sequence grows, this fixed-size vector becomes an information bottleneck, leading to loss of context.

Representing Context in Hidden States

At any step t, the hidden state vector h_t contains information from the current input x_t and all previous inputs [x_1, ..., x_{t-1}], acting as a running summary.

Vector Dynamics

The hidden state represents a point in a high-dimensional space. As the network processes inputs, this point traces a trajectory that represents the changing context. The network learns to group similar histories into nearby regions of this space.

Initializing Hidden States

At step t=1, the previous hidden state h_0 is typically initialized to a tensor of all zeros. Alternatively, h_0 can be defined as a learnable parameter initialized with random values and optimized during training.

The Memory Bottleneck

Because the hidden state vector has a fixed dimension, it cannot store infinite history. Early details are eventually overwritten by newer inputs.

Information Bottleneck Concept

A hidden state of size H must compress a sequence of arbitrary length into H numbers. This lossy compression causes the model to prioritize recent inputs over distant history, limiting its effective memory span.

Tracking Decay

By tracking the activations of the hidden state over time, we can observe how the influence of early tokens decays exponentially as new inputs are processed, demonstrating the network's short-term focus.