Univariate Analysis: Histograms and KDEs

Univariate analysis examines one variable at a time to understand its distribution — shape, center, spread, and outliers. Histograms and KDE plots are the primary tools, revealing whether a feature is normally distributed, skewed, bimodal, or discrete.


Histograms

A histogram divides the feature range into bins and counts how many values fall in each bin. The number of bins is a key parameter — too few and you miss shape; too many and you see noise.

Plotting Histograms with matplotlib and pandas

<pre><code class="language-python">import matplotlib.pyplot as plt import pandas as pd df = pd.read_csv("data.csv") # Single column df["income"].hist(bins=30, edgecolor="white", figsize=(8, 4)) plt.title("Income Distribution") plt.xlabel("Income") plt.ylabel("Count") plt.show() # All numeric columns df.hist(bins=30, figsize=(14, 10), edgecolor="white") plt.tight_layout() plt.show()</pre>

Kernel Density Estimation (KDE)

A KDE smooths the histogram into a continuous probability density curve, making it easier to compare distributions and identify multi-modality without the bin-size sensitivity of histograms.

KDE Plots with seaborn

<pre><code class="language-python">import seaborn as sns # KDE only sns.kdeplot(data=df, x="income", fill=True, bw_adjust=0.8) # Histogram + KDE combined sns.histplot(data=df, x="income", kde=True, bins=30) plt.title("Income Distribution with KDE") plt.show() # Compare distributions by class sns.kdeplot(data=df, x="income", hue="churn", fill=True, alpha=0.5) plt.show()</pre>

Reading the Distribution Shape

Right skew (long right tail): income, house prices — consider log transform. Left skew (long left tail): test scores near maximum. Bimodal (two peaks): suggests a hidden grouping variable (e.g., male/female height without a gender column). Uniform: feature may not be informative.