1D Convolutions for Time-Series Data
1D convolutions slide a kernel along a single temporal dimension, making them highly efficient for extracting local patterns in time-series and sequential data.
Mechanics of 1D Convolutions
1D convolutions capture patterns along a single axis, mapping sequences to feature vectors.
Sliding Window in Time
Unlike 2D convolutions that slide across spatial dimensions \\((H, W)\\), 1D convolutions slide a kernel along a single dimension (usually time or sequence length). For an input tensor of shape \\((N, C, L)\\), where \\(C\\) is the number of features (channels) and \\(L\\) is the sequence length, a 1D convolution applies a kernel of size \\(K\\) across the length.
The convolution computes dot products between the kernel weights and the input features within the sliding temporal window. This allows the model to capture local temporal correlations, such as sudden spikes in stock prices or specific phonemes in audio signals.
Temporal Equivariance and Receptive Fields
1D convolutions enforce temporal translation equivariance. If a temporal pattern occurs later in the sequence, its activation in the output feature map shifts by the same amount, allowing the model to detect events regardless of when they occur.
As layers are stacked, the receptive field expands temporally. Stacking 1D convolutions allows deeper layers to capture long-term dependencies across the sequence, matching the modeling capabilities of recurrent networks.
Comparison with Recurrent Networks (RNNs)
Using 1D convolutions for sequential data offers distinct computation advantages over recurrent structures.
Parallel Execution and Speed
A major limitation of RNNs is their sequential nature; they must compute activations step-by-step, which prevents parallelization on GPUs. 1D convolutions process all sequence steps in parallel, leading to significantly faster training times.
This speed advantage makes 1D CNNs popular for processing long sequences (such as raw audio or long-term sensor data) where RNNs would suffer from slow training and vanishing gradients.
Memory Footprint and Context Length
RNNs maintain a hidden state that updates at each time step. While this allows them to theoretically model infinitely long dependencies, in practice they struggle with long-term memory. 1D CNNs have a fixed receptive field, which limits their context length but makes their training stable.
To capture very long dependencies, 1D CNNs use dilated convolutions (Temporal Convolutional Networks or TCNs), which expand the temporal receptive field exponentially without increasing parameter counts, avoiding the vanishing gradient issues of RNNs.
PyTorch nn.Conv1d Implementation
Let's implement a 1D CNN classifier and review its execution parameters in PyTorch.
Building a Time-Series Classifier
The code below shows how to build a 1D CNN model in PyTorch for classifying multi-channel time-series data.
<pre><code class="language-python">import torch import torch.nn as nn class TimeSeriesCNN(nn.Module): def __init__(self, in_features, num_classes): super().__init__() self.features = nn.Sequential( # Input shape: [batch, in_features, seq_len] nn.Conv1d(in_channels=in_features, out_channels=32, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool1d(2), # Halves sequence length nn.Conv1d(32, 64, kernel_size=3, padding=1), nn.ReLU(), nn.AdaptiveAvgPool1d(1) # Collapses sequence length to 1 ) self.classifier = nn.Linear(64, num_classes) def forward(self, x): x = self.features(x) x = torch.flatten(x, 1) # Shape: [batch, 64] return self.classifier(x) # Test model: batch=2, features=5 (sensors), seq_len=100 x = torch.randn(2, 5, 100) model = TimeSeriesCNN(in_features=5, num_classes=3) out = model(x) print("Output logits shape:", out.shape) # [2, 3]</pre>In this architecture, AdaptiveAvgPool1d(1) collapses the temporal dimension, ensuring that the feature output has a fixed length of 64 regardless of the input sequence length, allowing the model to handle variable duration time-series.
Causal Convolutions for Autoregressive Tasks
When using 1D convolutions for autoregressive tasks (like text generation or forecasting), we must prevent the model from looking into the future. This is done by using causal convolutions, where the output at step \\(t\\) only depends on inputs from step \\(t\\) and earlier.
In PyTorch, causal convolutions are implemented by using standard convolutions with left padding, shifting the output tensor to ensure that future elements are excluded from the calculations. This architecture is common in TCNs.