Pivot Tables for Data Summarization
Pivot tables reorganize data by grouping rows and columns and computing aggregate statistics — a powerful way to surface group-level patterns, compare segments, and detect interactions between categorical variables during EDA.
Creating Pivot Tables in Pandas
pd.pivot_table() accepts an index (row groups), columns (column groups), values (what to aggregate), and an aggregation function. Multiple aggregation functions can be passed simultaneously.
Basic Pivot Table
<pre><code class="language-python">import pandas as pd
df = pd.read_csv("sales.csv")
# Average revenue by region and product category
pivot = pd.pivot_table(
df,
values="revenue",
index="region",
columns="category",
aggfunc="mean",
fill_value=0
)
print(pivot.round(2))</pre>
Multiple Aggregation Functions
<pre><code class="language-python">pivot_multi = pd.pivot_table(
df,
values="revenue",
index="region",
columns="channel",
aggfunc=["mean", "sum", "count"],
margins=True # adds row/column totals
)
print(pivot_multi)</pre>
groupby as an Alternative
For more flexible multi-level aggregations, pandas groupby() offers the same power with a more composable API and is generally preferred in programmatic data pipelines.
groupby with agg
<pre><code class="language-python"># Summary stats by region and category
summary = (
df.groupby(["region", "category"])["revenue"]
.agg(["mean", "sum", "count", "std"])
.reset_index()
.rename(columns={"mean": "avg_revenue",
"count": "num_transactions"})
)
print(summary.head(10))</pre>