Univariate Analysis: Histograms and KDEs
Univariate analysis examines one variable at a time to understand its distribution — shape, center, spread, and outliers. Histograms and KDE plots are the primary tools, revealing whether a feature is normally distributed, skewed, bimodal, or discrete.
Histograms
A histogram divides the feature range into bins and counts how many values fall in each bin. The number of bins is a key parameter — too few and you miss shape; too many and you see noise.
Plotting Histograms with matplotlib and pandas
Kernel Density Estimation (KDE)
A KDE smooths the histogram into a continuous probability density curve, making it easier to compare distributions and identify multi-modality without the bin-size sensitivity of histograms.
KDE Plots with seaborn
Reading the Distribution Shape
Right skew (long right tail): income, house prices — consider log transform. Left skew (long left tail): test scores near maximum. Bimodal (two peaks): suggests a hidden grouping variable (e.g., male/female height without a gender column). Uniform: feature may not be informative.