Forecasting Multivariate Time Series
Multivariate time-series forecasting uses multiple correlated features to predict future states. By inputting multi-dimensional vectors at each step, recurrent networks can learn lead-lag relationships and cross-feature dynamics.
Handling Multiple Input Features
In multivariate forecasting, the input shape changes to include multiple features (e.g., target variable, weather metrics, marketing spend, and temporal encodings).
Input Representation
At each time step, the input is a feature vector of size M. The overall batch tensor has shape [batch_size, seq_len, M], representing a sequence of vectors rather than scalar values.
Lead-Lag Relationships
Unlike univariate models, multivariate networks can detect causal patterns across variables, such as how changes in advertising spend today predict changes in product sales three days later.
Multivariate LSTM in PyTorch
A multivariate LSTM accepts input dimensions equal to the number of features and maps them to the target variable.
Model Implementation
Each input feature must be scaled independently before training to ensure stable gradient updates.
<pre><code class="language-python">import torch import torch.nn as nn class MultivariateLSTM(nn.Module): def __init__(self, num_features=5, hidden_size=64): super().__init__() # input_size matches the number of features self.lstm = nn.LSTM(input_size=num_features, hidden_size=hidden_size, batch_first=True) self.linear = nn.Linear(hidden_size, 1) # Predict only the target variable def forward(self, x): # x shape: [batch, seq_len, num_features] out, (h_n, c_n) = self.lstm(x) return self.linear(out[:, -1, :]) # Shape: [batch, 1] model = MultivariateLSTM(num_features=5) x_sample = torch.randn(8, 30, 5) # batch=8, sequence=30 steps, features=5 print(model(x_sample).shape) # torch.Size([8, 1])</pre>Feature Scaling & Inverse Transformation
When scaling multivariate datasets, scale each column independently using a dedicated scaler. This is critical because the target variable must be inverse-transformed using its specific scaling factors to generate predictions on the original scale.