Bidirectional LSTMs/GRUs

Standard recurrent networks only process sequences from left to right, ignoring future context. Bidirectional recurrent networks resolve this by training two separate layers—one forward and one backward—and concatenating their outputs at each step.


Bidirectional Architecture

A bidirectional network processes the input sequence in both directions simultaneously, yielding two independent hidden state sequences.

Forward and Backward passes

The forward layer processes inputs from t=1 to T, producing states h_t^f. The backward layer processes inputs from t=T to 1, producing states h_t^b. At step t, these states are concatenated: h_t = [h_t^f, h_t^b], doubling the representation size.

PyTorch Bidirectional LSTM

Setting bidirectional=True in PyTorch automatically instantiates both layers and handles concatenation.

<pre><code class="language-python">import torch import torch.nn as nn # Bidirectional LSTM with hidden size of 20 (output will be 40) bilstm = nn.LSTM(input_size=10, hidden_size=20, batch_first=True, bidirectional=True) x = torch.randn(3, 5, 10) output, (h_n, c_n) = bilstm(x) print(output.shape) # torch.Size([3, 5, 40]) -> [batch, seq, 2 * hidden] print(h_n.shape) # torch.Size([2, 3, 20]) -> [2 * num_layers, batch, hidden]</pre>

Use Case Constraints

Bidirectional networks are highly effective for sequence analysis but cannot be deployed on autoregressive tasks.

When to Use

Use bidirectional networks for offline tasks where the entire sequence is available at once (e.g., named entity recognition, sentiment analysis, translation encoding).

Causal Restrictions

Do not use bidirectional networks for real-time forecasting or autoregressive text generation. Since the backward layer relies on future tokens, predicting step t+1 would require knowing step t+1 in advance, causing invalid data leakage.