Recurrent Neural Networks (RNNs) Unrolled

Recurrent Neural Networks (RNNs) solve sequence modeling challenges by introducing feedback loops. By maintaining an internal hidden state that is updated at each step, RNNs can process variable-length inputs while sharing parameters across time.

The Recurrence Formula

An RNN processes a sequence step-by-step. At each step t, it takes the current input x_t and the previous hidden state h_{t-1} to calculate the new hidden state h_t.

The Hidden State Math

The formula for updating the hidden state is: h_t = \\tanh(W_{hh} h_{t-1} + W_{xh} x_t + b_h). The output at step t is: y_t = W_{hy} h_t + b_y. The weights W_{hh}, W_{xh}, and W_{hy} are shared across all time steps.

Parameter Sharing Advantage

Using the same weight matrices at every step allows the model to process sequences of any length. It also enables the network to detect a pattern regardless of where it appears in the sequence.

Unrolling Through Time

To understand how gradients flow during training, we visualize the RNN recurrence as an unrolled computational graph spanning the entire sequence length.

Unrolled Representation

Unrolling reveals that an RNN is a deep feedforward network where each layer represents a time step and shares the same parameters. The hidden state acts as the link that passes information forward across layers.

Implementing a Simple RNN in PyTorch

PyTorch provides a simple nn.RNN module that handles this recurrence automatically.

<pre><code class="language-python">import torch import torch.nn as nn # Input size=10, Hidden size=20, batch_first=True rnn = nn.RNN(input_size=10, hidden_size=20, num_layers=1, batch_first=True) # Input tensor: [batch_size, sequence_length, input_size] x = torch.randn(3, 5, 10) # Forward pass returns outputs and the final hidden state output, h_n = rnn(x) print(output.shape) # torch.Size([3, 5, 20]) -> outputs for all time steps print(h_n.shape) # torch.Size([1, 3, 20]) -> final hidden state</pre>