Pivot Tables for Data Summarization

Pivot tables reorganize data by grouping rows and columns and computing aggregate statistics — a powerful way to surface group-level patterns, compare segments, and detect interactions between categorical variables during EDA.


Creating Pivot Tables in Pandas

pd.pivot_table() accepts an index (row groups), columns (column groups), values (what to aggregate), and an aggregation function. Multiple aggregation functions can be passed simultaneously.

Basic Pivot Table

<pre><code class="language-python">import pandas as pd df = pd.read_csv("sales.csv") # Average revenue by region and product category pivot = pd.pivot_table( df, values="revenue", index="region", columns="category", aggfunc="mean", fill_value=0 ) print(pivot.round(2))</pre>

Multiple Aggregation Functions

<pre><code class="language-python">pivot_multi = pd.pivot_table( df, values="revenue", index="region", columns="channel", aggfunc=["mean", "sum", "count"], margins=True # adds row/column totals ) print(pivot_multi)</pre>

groupby as an Alternative

For more flexible multi-level aggregations, pandas groupby() offers the same power with a more composable API and is generally preferred in programmatic data pipelines.

groupby with agg

<pre><code class="language-python"># Summary stats by region and category summary = ( df.groupby(["region", "category"])["revenue"] .agg(["mean", "sum", "count", "std"]) .reset_index() .rename(columns={"mean": "avg_revenue", "count": "num_transactions"}) ) print(summary.head(10))</pre>