Forecasting Multivariate Time Series

Multivariate time-series forecasting uses multiple correlated features to predict future states. By inputting multi-dimensional vectors at each step, recurrent networks can learn lead-lag relationships and cross-feature dynamics.

Handling Multiple Input Features

In multivariate forecasting, the input shape changes to include multiple features (e.g., target variable, weather metrics, marketing spend, and temporal encodings).

Input Representation

At each time step, the input is a feature vector of size M. The overall batch tensor has shape [batch_size, seq_len, M], representing a sequence of vectors rather than scalar values.

Lead-Lag Relationships

Unlike univariate models, multivariate networks can detect causal patterns across variables, such as how changes in advertising spend today predict changes in product sales three days later.

Multivariate LSTM in PyTorch

A multivariate LSTM accepts input dimensions equal to the number of features and maps them to the target variable.

Model Implementation

Each input feature must be scaled independently before training to ensure stable gradient updates.

<pre><code class="language-python">import torch import torch.nn as nn class MultivariateLSTM(nn.Module): def __init__(self, num_features=5, hidden_size=64): super().__init__() # input_size matches the number of features self.lstm = nn.LSTM(input_size=num_features, hidden_size=hidden_size, batch_first=True) self.linear = nn.Linear(hidden_size, 1) # Predict only the target variable def forward(self, x): # x shape: [batch, seq_len, num_features] out, (h_n, c_n) = self.lstm(x) return self.linear(out[:, -1, :]) # Shape: [batch, 1] model = MultivariateLSTM(num_features=5) x_sample = torch.randn(8, 30, 5) # batch=8, sequence=30 steps, features=5 print(model(x_sample).shape) # torch.Size([8, 1])</pre>

Feature Scaling & Inverse Transformation

When scaling multivariate datasets, scale each column independently using a dedicated scaler. This is critical because the target variable must be inverse-transformed using its specific scaling factors to generate predictions on the original scale.