P e x c e r a

Bivariate Analysis: Scatter Plots and Hexbins

Bivariate analysis examines the relationship between two variables — are they correlated, do they form clusters, are there non-linear patterns? Scatter plots reveal this for small datasets; hexbin plots handle large datasets where point overlap obscures the structure.


Scatter Plots

A scatter plot places each observation as a point in 2D space defined by two feature values. Adding a third dimension via color (hue) or size turns it into a trivariate display.

Scatter Plots with seaborn

<pre><code class="language-python">import seaborn as sns import matplotlib.pyplot as plt import pandas as pd df = pd.read_csv("data.csv") # Basic scatter sns.scatterplot(data=df, x="age", y="income", alpha=0.4) plt.title("Age vs Income") plt.show() # Colored by class sns.scatterplot(data=df, x="age", y="income", hue="churn", palette="Set1", alpha=0.5) plt.show()</pre>

Hexbin Plots for Large Datasets

When a dataset has thousands of points, scatter plots become unreadable due to overplotting. Hexbin plots divide the 2D plane into hexagonal bins and color each by point density, preserving the overall structure.

Hexbin with matplotlib and seaborn

<pre><code class="language-python"># matplotlib hexbin plt.hexbin(df["age"], df["income"], gridsize=30, cmap="YlOrRd") plt.colorbar(label="Count") plt.xlabel("Age") plt.ylabel("Income") plt.show() # seaborn jointplot with hexbin sns.jointplot(data=df, x="age", y="income", kind="hex", cmap="Blues") plt.show()</pre>

Adding a Regression Line

<pre><code class="language-python"># Scatter with linear regression overlay sns.regplot(data=df, x="age", y="income", scatter_kws={"alpha": 0.3}, line_kws={"color": "red"}) plt.title("Age vs Income with Regression Line") plt.show()</pre>