Recurrent Neural Networks (RNNs) Unrolled
Recurrent Neural Networks (RNNs) solve sequence modeling challenges by introducing feedback loops. By maintaining an internal hidden state that is updated at each step, RNNs can process variable-length inputs while sharing parameters across time.
The Recurrence Formula
An RNN processes a sequence step-by-step. At each step t, it takes the current input x_t and the previous hidden state h_{t-1} to calculate the new hidden state h_t.
The Hidden State Math
The formula for updating the hidden state is: h_t = \\tanh(W_{hh} h_{t-1} + W_{xh} x_t + b_h). The output at step t is: y_t = W_{hy} h_t + b_y. The weights W_{hh}, W_{xh}, and W_{hy} are shared across all time steps.
Parameter Sharing Advantage
Using the same weight matrices at every step allows the model to process sequences of any length. It also enables the network to detect a pattern regardless of where it appears in the sequence.
Unrolling Through Time
To understand how gradients flow during training, we visualize the RNN recurrence as an unrolled computational graph spanning the entire sequence length.
Unrolled Representation
Unrolling reveals that an RNN is a deep feedforward network where each layer represents a time step and shares the same parameters. The hidden state acts as the link that passes information forward across layers.
Implementing a Simple RNN in PyTorch
PyTorch provides a simple nn.RNN module that handles this recurrence automatically.