Introduction to Pandas DataFrames

The pandas DataFrame is the workhorse data structure for ML preprocessing in Python — a labeled, two-dimensional table that supports powerful indexing, filtering, and aggregation. Mastering it is non-negotiable for any data practitioner.

Creating and Inspecting DataFrames

DataFrames can be created from dictionaries, CSV files, databases, or NumPy arrays. The first step with any new dataset is always inspection — shape, dtypes, and a few sample rows.

Loading and Inspecting Data

<pre><code class="language-python">import pandas as pd df = pd.read_csv("data.csv") print(df.shape) # (rows, cols) print(df.dtypes) # column types print(df.head()) # first 5 rows print(df.info()) # non-null counts and types print(df.describe()) # summary statistics</pre>

Selecting and Filtering

<pre><code class="language-python"># Column selection df["age"] # single column (Series) df[["age", "income"]] # multiple columns # Row filtering df[df["age"] > 30] df.loc[df["income"] > 50000, ["name", "income"]] df.iloc[0:5] # first 5 rows by position</pre>

Modifying DataFrames

DataFrames support in-place and out-of-place transformations. Understanding when pandas returns a view versus a copy prevents the common SettingWithCopyWarning.

Adding, Renaming, and Dropping Columns

<pre><code class="language-python"># Add a derived column df["age_squared"] = df["age"] ** 2 # Rename columns df = df.rename(columns={"old_name": "new_name"}) # Drop columns df = df.drop(columns=["unnecessary_col"]) # Apply a function element-wise df["log_income"] = df["income"].apply(lambda x: x ** 0.5)</pre>