Setting up CI/CD for Machine Learning

CI/CD for machine learning automates code quality checks, model training, evaluation gating, and deployment — ensuring every change is validated and only models meeting a performance threshold reach production.


What CI/CD Means in ML

Standard software CI/CD validates code; ML CI/CD must also validate data and model quality. A failing model accuracy check should block deployment just as a failing unit test would.

ML CI/CD Pipeline Stages

  • CI (on every PR): lint → unit tests → data validation → fast model smoke test
  • CD (on merge to main): full training → model evaluation → performance gating → containerize → deploy to staging → integration tests → promote to production

GitHub Actions Workflow

GitHub Actions defines workflows as YAML files in .github/workflows/ that trigger on code events and run jobs on managed or self-hosted runners.

Example CI Workflow

<pre><code class="language-python"># .github/workflows/ml-ci.yml # name: ML CI # on: [push, pull_request] # jobs: # test: # runs-on: ubuntu-latest # steps: # - uses: actions/checkout@v4 # - uses: actions/setup-python@v5 # with: # python-version: "3.11" # - run: pip install -r requirements.txt # - run: pytest tests/ -v --tb=short # - run: python scripts/validate_data.py # - run: python scripts/train_and_evaluate.py # env: # MIN_ACCURACY: "0.90" # build: # needs: test # runs-on: ubuntu-latest # steps: # - uses: actions/checkout@v4 # - run: docker build -t my-ml-model:${{ github.sha }} . # - run: docker push my-registry/my-ml-model:${{ github.sha }}</pre>

Performance Gating Script

<pre><code class="language-python"># scripts/train_and_evaluate.py import os, sys, joblib from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import cross_val_score MIN_ACCURACY = float(os.getenv("MIN_ACCURACY", "0.90")) X, y = load_breast_cancer(return_X_y=True) model = RandomForestClassifier(n_estimators=100, random_state=42) scores = cross_val_score(model, X, y, cv=5, scoring="accuracy") mean_acc = scores.mean() print(f"CV Accuracy: {mean_acc:.4f}") if mean_acc < MIN_ACCURACY: print(f"FAILED: accuracy {mean_acc:.4f} < threshold {MIN_ACCURACY}") sys.exit(1) # fails the CI step model.fit(X, y) joblib.dump(model, "model.joblib") print("Model saved successfully.")</pre>

Data Version Control in CI

Use DVC (Data Version Control) alongside Git to version datasets and pipeline outputs, enabling reproducible training runs in CI environments.

DVC in a CI Step

<pre><code class="language-python"># In your GitHub Actions workflow: # - run: pip install dvc[s3] # - run: dvc pull # downloads versioned data from S3/GCS # - run: dvc repro # re-runs only changed pipeline stages # - run: dvc push # pushes any new outputs back to remote # DVC tracks data files with lightweight .dvc pointer files in Git, # keeping large datasets out of the repository while maintaining full lineage.</pre>