Correlation Matrices and Heatmaps
A correlation matrix computes the pairwise Pearson correlation coefficient between all numeric features, and a heatmap renders it visually — making it fast to identify strongly correlated (redundant) feature pairs and features correlated with the target.
Computing and Visualizing Correlations
The Pearson correlation coefficient ranges from −1 (perfect negative) to +1 (perfect positive), with 0 indicating no linear relationship. Values above 0.8 or below −0.8 between two features often indicate redundancy.
Seaborn Heatmap
<pre><code class="language-python">import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("data.csv")
corr = df.corr(numeric_only=True)
plt.figure(figsize=(10, 8))
sns.heatmap(
corr,
annot=True, fmt=".2f",
cmap="coolwarm", center=0,
square=True, linewidths=0.5
)
plt.title("Feature Correlation Matrix")
plt.tight_layout()
plt.show()</pre>
Using Correlations for Feature Selection
Features with very high mutual correlation (> 0.9) are candidates for removal since they add no new information. Features with high absolute correlation to the target are strong candidates for inclusion.
Identifying Highly Correlated Pairs
<pre><code class="language-python">import numpy as np
# Upper triangle mask
mask = np.triu(np.ones_like(corr, dtype=bool))
corr_unstacked = corr.mask(mask).stack()
high_corr = corr_unstacked[corr_unstacked.abs() > 0.85]
print("Highly correlated pairs:")
print(high_corr.sort_values(ascending=False))</pre>
Target Correlation
<pre><code class="language-python"># Correlations with the target variable
target_corr = corr["target"].drop("target").abs().sort_values(ascending=False)
print(target_corr)
# High values → likely predictive features
# Near zero → possibly uninformative (but check non-linear relationships too)</pre>