The Limitations of CNNs/MLPs for Sequences

Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are the workhorses of tabular and image classification, but they struggle with sequential data. Their architectural reliance on fixed input dimensions and lack of temporal memory limits their performance on sequences.

Multi-Layer Perceptron (MLP) Limitations

MLPs assume inputs are independent and of a fixed size. This makes them ill-suited for variable-length sentences or time-series data.

Fixed Input Size Constraint

An MLP requires a fixed-size input vector because its input layer weight matrix has a fixed dimension. To process text, sentences must be clipped or padded, which discards context or adds computational overhead.

Lack of Parameter Sharing Over Time

MLPs assign separate weights to each input feature position. If a pattern (like a word) shifts from the beginning of a sentence to the end, the MLP cannot share the learned representation across those different temporal locations.

Convolutional Neural Network (CNN) Limitations

While 1D CNNs can process sequential data, they are constrained by a finite receptive field and potential data leakage.

Bounded Receptive Field

CNNs extract local patterns using a sliding window. The range of temporal context a CNN can capture is bounded by its kernel size and depth. Capturing long-term dependencies requires stacking many layers, which increases parameter count.

Causal Convolutions Requirement

Standard convolutions look forward and backward in time. When forecasting, future information must not leak into past predictions. We must use causal convolutions, which restrict the kernel to only compute values from the current and past steps.