Summary Statistics in Pandas (describe)
Pandas' describe() method is the fastest way to get an at-a-glance quantitative summary of a dataset — covering central tendency, spread, and range for every numeric column in a single call.
Understanding describe() Output
describe() returns count, mean, std, min, 25th percentile (Q1), median (Q2), 75th percentile (Q3), and max for each numeric column. Comparing mean to median reveals skewness; a large std relative to mean suggests high dispersion.
Numeric and Categorical Summaries
<pre><code class="language-python">import pandas as pd
df = pd.read_csv("data.csv")
# Numeric summary
print(df.describe())
# Categorical summary
print(df.describe(include="object"))
# Returns: count, unique, top (mode), freq (mode count)
# All columns
print(df.describe(include="all"))
# Specific percentiles
print(df.describe(percentiles=[0.1, 0.5, 0.9]))</pre>
Going Beyond describe()
Supplementary methods provide richer statistics: skewness, kurtosis, value counts for categoricals, and cross-tabulations for relationships between two categorical columns.
Skewness, Kurtosis, and Value Counts
<pre><code class="language-python"># Skewness: > 1 right-skewed, < -1 left-skewed
print(df.skew(numeric_only=True))
# Kurtosis: > 3 heavy-tailed, < 3 light-tailed
print(df.kurtosis(numeric_only=True))
# Categorical distribution
print(df["category"].value_counts())
print(df["category"].value_counts(normalize=True)) # proportions
# Crosstab between two categoricals
pd.crosstab(df["gender"], df["churn"], normalize="index")</pre>