Handling Dates and Times in Datasets

Raw datetime strings are useless to ML algorithms, but when decomposed into numeric features — year, month, day-of-week, hour, days since event — they become powerful predictive signals. Pandas makes this extraction concise and efficient.


Parsing and Accessing Datetime Components

Convert string columns to datetime64 using pd.to_datetime(), then access components via the .dt accessor. Always parse datetimes during loading to catch format errors early.

Extracting Temporal Features

<pre><code class="language-python">import pandas as pd df = pd.read_csv("transactions.csv") df["timestamp"] = pd.to_datetime(df["timestamp"]) df["year"] = df["timestamp"].dt.year df["month"] = df["timestamp"].dt.month df["day_of_week"] = df["timestamp"].dt.dayofweek # 0=Monday df["hour"] = df["timestamp"].dt.hour df["is_weekend"] = df["timestamp"].dt.dayofweek >= 5 df["quarter"] = df["timestamp"].dt.quarter print(df.head())</pre>

Engineering Time-Difference Features

The elapsed time since a reference event (account creation, last purchase) is often more predictive than the raw date. Pandas supports arithmetic on datetime columns natively.

Days Since and Cyclic Encoding

<pre><code class="language-python">import numpy as np # Days since a reference date ref = pd.Timestamp("2024-01-01") df["days_since_ref"] = (df["timestamp"] - ref).dt.days # Cyclic encoding for month (prevents 1 being "far" from 12) df["month_sin"] = np.sin(2 * np.pi * df["month"] / 12) df["month_cos"] = np.cos(2 * np.pi * df["month"] / 12)</pre>

Why Cyclic Encoding Matters

Month 12 and month 1 are temporally adjacent, but a raw integer encoding makes them appear maximally distant. Encoding cyclical features as (sin, cos) pairs preserves this circularity so linear models can detect seasonal patterns correctly.