Setting up CI/CD for Machine Learning
CI/CD for machine learning automates code quality checks, model training, evaluation gating, and deployment — ensuring every change is validated and only models meeting a performance threshold reach production.
What CI/CD Means in ML
Standard software CI/CD validates code; ML CI/CD must also validate data and model quality. A failing model accuracy check should block deployment just as a failing unit test would.
ML CI/CD Pipeline Stages
- CI (on every PR): lint → unit tests → data validation → fast model smoke test
- CD (on merge to main): full training → model evaluation → performance gating → containerize → deploy to staging → integration tests → promote to production
GitHub Actions Workflow
GitHub Actions defines workflows as YAML files in .github/workflows/ that trigger on code events and run jobs on managed or self-hosted runners.
Example CI Workflow
<pre><code class="language-python"># .github/workflows/ml-ci.yml
# name: ML CI
# on: [push, pull_request]
# jobs:
# test:
# runs-on: ubuntu-latest
# steps:
# - uses: actions/checkout@v4
# - uses: actions/setup-python@v5
# with:
# python-version: "3.11"
# - run: pip install -r requirements.txt
# - run: pytest tests/ -v --tb=short
# - run: python scripts/validate_data.py
# - run: python scripts/train_and_evaluate.py
# env:
# MIN_ACCURACY: "0.90"
# build:
# needs: test
# runs-on: ubuntu-latest
# steps:
# - uses: actions/checkout@v4
# - run: docker build -t my-ml-model:${{ github.sha }} .
# - run: docker push my-registry/my-ml-model:${{ github.sha }}</pre>
Performance Gating Script
<pre><code class="language-python"># scripts/train_and_evaluate.py
import os, sys, joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score
MIN_ACCURACY = float(os.getenv("MIN_ACCURACY", "0.90"))
X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier(n_estimators=100, random_state=42)
scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")
mean_acc = scores.mean()
print(f"CV Accuracy: {mean_acc:.4f}")
if mean_acc < MIN_ACCURACY:
print(f"FAILED: accuracy {mean_acc:.4f} < threshold {MIN_ACCURACY}")
sys.exit(1) # fails the CI step
model.fit(X, y)
joblib.dump(model, "model.joblib")
print("Model saved successfully.")</pre>
Data Version Control in CI
Use DVC (Data Version Control) alongside Git to version datasets and pipeline outputs, enabling reproducible training runs in CI environments.
DVC in a CI Step
<pre><code class="language-python"># In your GitHub Actions workflow:
# - run: pip install dvc[s3]
# - run: dvc pull # downloads versioned data from S3/GCS
# - run: dvc repro # re-runs only changed pipeline stages
# - run: dvc push # pushes any new outputs back to remote
# DVC tracks data files with lightweight .dvc pointer files in Git,
# keeping large datasets out of the repository while maintaining full lineage.</pre>