Box Plots and Violin Plots

Box plots compactly summarize a distribution through its median, quartiles, and outliers, while violin plots add a KDE layer to reveal the full shape of the distribution — making them complementary tools for comparing feature distributions across groups.


Box Plots

A box plot shows: the median (central line), the IQR (box), whiskers extending to 1.5×IQR, and individual points beyond the whiskers as outliers. They are especially useful for comparing distributions across categorical groups.

Box Plots with seaborn

<pre><code class="language-python">import seaborn as sns import matplotlib.pyplot as plt import pandas as pd df = pd.read_csv("data.csv") # Distribution of income by churn status sns.boxplot(data=df, x="churn", y="income", palette="Set2") plt.title("Income Distribution by Churn") plt.show() # Multiple features at once sns.boxplot(data=df[["age", "income", "spend"]]) plt.xticks(rotation=45) plt.show()</pre>

Violin Plots

Violin plots combine a box plot with a mirrored KDE on each side, showing the full distribution shape. They are more informative than box plots when distributions are multi-modal or heavily skewed.

Violin Plots with seaborn

<pre><code class="language-python">sns.violinplot(data=df, x="category", y="spend", inner="box", # shows box plot inside palette="muted") plt.title("Spend Distribution by Category") plt.show() # Split violin for binary comparison sns.violinplot(data=df, x="product", y="revenue", hue="region", split=True, palette="Set1") plt.show()</pre>

Choosing Between Box and Violin

Use box plots when comparing many groups side by side (they are more compact) or when outlier identification is the primary goal. Use violin plots when the shape of the distribution within each group is important, especially for detecting bimodality or asymmetry that a box plot hides.