LSTM Gates: Forget, Input, and Output

An LSTM cell regulates information flow using three sigmoid-activated gates. These gates dynamically decide what to discard from memory, what to write from the new input, and what to output as the current state.

Mathematical Formulation of Gates

Each gate is a linear layer followed by a Sigmoid activation, yielding outputs between 0 (completely closed) and 1 (completely open).

Forget and Input Gates

1. Forget Gate: f_t = \\sigma(W_f [h_{t-1}, x_t] + b_f) determines what to discard from the previous cell state.
2. Input Gate: i_t = \\sigma(W_i [h_{t-1}, x_t] + b_i) determines which values to update.
3. Candidate State: \\tilde{C}_t = \\tanh(W_c [h_{t-1}, x_t] + b_c) creates candidate values to add to the state.

Updating States

The cell state update is a blend of past memory and new candidates: C_t = f_t \\odot C_{t-1} + i_t \\odot \\tilde{C}_t. The output gate o_t = \\sigma(W_o [h_{t-1}, x_t] + b_o) filters the cell state to yield the new hidden state: h_t = o_t \\odot \\tanh(C_t).

Gating Dynamics in Practice

Gating allows the cell to maintain memories over many steps. If the forget gate is close to 1 and the input gate is close to 0, the memory travels down the sequence unchanged.

Controlling the Gradient Highway

Because the cell state update C_t = f_t \\odot C_{t-1} + i_t \\odot \\tilde{C}_t is additive, when backpropagating, the derivative of C_t with respect to C_{t-1} contains the term f_t. If the model learns to keep the forget gate open (f_t \\approx 1), the gradient flows backward indefinitely without vanishing.