Generating Probability Distributions in Python

Before collecting real data, you can simulate it. Generating samples from known distributions lets you test your models, verify your statistics code, and create training data for experiments. NumPy's random module and SciPy's stats module give you access to every major distribution.


The Core Distributions in AI

Three distributions appear constantly in machine learning: the Normal (Gaussian) — used for weight initialization, noise, and Bayesian priors; the Binomial — used for classification outcomes and Dropout; and the Uniform — used for random initialisation ranges.

Sampling with NumPy

<pre><code class="language-python">import numpy as np # Normal distribution: centre=0, std=1, 1000 samples normal = np.random.normal(loc=0.0, scale=1.0, size=1000) # Uniform distribution: range [0, 1) uniform = np.random.uniform(low=0.0, high=1.0, size=500) # Binomial: n=1 trials, p=0.5 each → like a coin flip coin_flips = np.random.binomial(n=1, p=0.5, size=100) # Poisson: λ=3 (average 3 events per interval) poisson = np.random.poisson(lam=3, size=200) </pre>

Working with scipy.stats

SciPy's stats module goes further than sampling — it lets you compute the PDF, CDF, and quantiles of any distribution analytically. This is used for p-value computation, confidence intervals, and building probabilistic models.

PDF and CDF

<pre><code class="language-python">from scipy import stats import numpy as np x = np.linspace(-4, 4, 100) # Standard Normal PDF and CDF pdf = stats.norm.pdf(x, loc=0, scale=1) cdf = stats.norm.cdf(x, loc=0, scale=1) # Probability of getting a value between -1 and 1: prob = stats.norm.cdf(1) - stats.norm.cdf(-1) print(f"P(-1 < X < 1) = {prob:.4f}") # 0.6827 (the 68% rule) </pre>

Setting Seeds for Reproducibility

Random number generation uses a pseudorandom algorithm. Setting a seed makes the sequence deterministic — every run with the same seed produces the same numbers. This is essential for reproducible research and fair model comparisons.

Fixing the Random Seed

<pre><code class="language-python"># Set seed at the start of your experiment np.random.seed(42) data1 = np.random.normal(0, 1, 5) print(data1) # Always the same values # Re-setting produces the same sequence again np.random.seed(42) data2 = np.random.normal(0, 1, 5) print(np.array_equal(data1, data2)) # True </pre>