Multivariate Analysis: Pairplots

A pairplot displays scatter plots for every pair of numeric features in a grid, with distributions along the diagonal — giving a comprehensive view of all pairwise relationships and class separability in a single figure.


Creating Pairplots with seaborn

Seaborn's pairplot() function generates the full pairwise grid automatically. The hue parameter colors points by class, making it easy to assess linear separability across feature pairs.

Basic and Colored Pairplots

<pre><code class="language-python">import seaborn as sns import matplotlib.pyplot as plt import pandas as pd from sklearn.datasets import load_iris iris = load_iris(as_frame=True) df = iris.frame # Basic pairplot sns.pairplot(df, diag_kind="kde") plt.suptitle("Iris Feature Pairplot", y=1.02) plt.show() # Colored by species sns.pairplot(df, hue="target", diag_kind="kde", plot_kws={"alpha": 0.5}) plt.show()</pre>

Interpreting and Limiting Pairplots

Pairplots scale quadratically with the number of features — 10 features produce 90 scatter plots, making them slow to render and hard to read. Use them only for datasets with fewer than ~15 features, and pre-select features of interest.

Selecting Features for Pairplot

<pre><code class="language-python"># Select only features of interest cols = ["age", "income", "spend", "tenure", "churn"] sns.pairplot(df[cols], hue="churn", corner=True, diag_kind="hist", plot_kws={"alpha": 0.4}) plt.show() # corner=True shows only the lower triangle, halving the plot count</pre>

Reading Pairplot Patterns

Elliptical clouds in scatter cells indicate linear correlation. Banana shapes indicate non-linear relationships. Class clusters visible in a cell indicate that feature pair is useful for classification. Completely overlapping class distributions indicate those features are not individually discriminative for that task.