Summary Statistics in Pandas (describe)

Pandas' describe() method is the fastest way to get an at-a-glance quantitative summary of a dataset — covering central tendency, spread, and range for every numeric column in a single call.

Understanding describe() Output

describe() returns count, mean, std, min, 25th percentile (Q1), median (Q2), 75th percentile (Q3), and max for each numeric column. Comparing mean to median reveals skewness; a large std relative to mean suggests high dispersion.

Numeric and Categorical Summaries

<pre><code class="language-python">import pandas as pd df = pd.read_csv("data.csv") # Numeric summary print(df.describe()) # Categorical summary print(df.describe(include="object")) # Returns: count, unique, top (mode), freq (mode count) # All columns print(df.describe(include="all")) # Specific percentiles print(df.describe(percentiles=[0.1, 0.5, 0.9]))</pre>

Going Beyond describe()

Supplementary methods provide richer statistics: skewness, kurtosis, value counts for categoricals, and cross-tabulations for relationships between two categorical columns.

Skewness, Kurtosis, and Value Counts

<pre><code class="language-python"># Skewness: > 1 right-skewed, < -1 left-skewed print(df.skew(numeric_only=True)) # Kurtosis: > 3 heavy-tailed, < 3 light-tailed print(df.kurtosis(numeric_only=True)) # Categorical distribution print(df["category"].value_counts()) print(df["category"].value_counts(normalize=True)) # proportions # Crosstab between two categoricals pd.crosstab(df["gender"], df["churn"], normalize="index")</pre>