Model Versioning with MLflow
MLflow is an open-source platform that tracks experiments, packages reproducible runs, and provides a model registry to version, stage, and serve ML models throughout their lifecycle.
Tracking Experiments
Every MLflow run records parameters, metrics, and artifacts to a central tracking server (local file system or remote URI), making experiments fully reproducible and comparable.
Logging a Scikit-Learn Run
<pre><code class="language-python">import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
mlflow.set_experiment("iris-classification")
with mlflow.start_run(run_name="rf-baseline"):
n_est, depth = 100, 5
model = RandomForestClassifier(n_estimators=n_est, max_depth=depth, random_state=42)
model.fit(X_train, y_train)
preds = model.predict(X_test)
# Log hyperparameters
mlflow.log_param("n_estimators", n_est)
mlflow.log_param("max_depth", depth)
# Log metrics
mlflow.log_metric("accuracy", accuracy_score(y_test, preds))
mlflow.log_metric("f1_macro", f1_score(y_test, preds, average="macro"))
# Log model as artifact
mlflow.sklearn.log_model(model, artifact_path="model")
print("Run ID:", mlflow.active_run().info.run_id)</pre>
Launching the MLflow UI
<pre><code class="language-python"># Run in terminal:
# mlflow ui
# Then open: http://localhost:5000
# All experiments, runs, params, metrics, and artifacts are visible
# You can compare runs side-by-side and download artifacts</pre>
The MLflow Model Registry
The Model Registry provides a centralised hub for managing model versions with lifecycle stages: Staging, Production, and Archived.
Registering and Transitioning Models
<pre><code class="language-python">from mlflow.tracking import MlflowClient
client = MlflowClient()
# Register a logged model
run_id = "<your-run-id>"
model_uri = f"runs:/{run_id}/model"
registered = mlflow.register_model(model_uri, name="IrisClassifier")
print("Version:", registered.version)
# Transition to Staging
client.transition_model_version_stage(
name="IrisClassifier",
version=registered.version,
stage="Staging"
)
# Load the production model anywhere
production_model = mlflow.sklearn.load_model("models:/IrisClassifier/Production")</pre>
Autologging
MLflow's autolog feature automatically captures parameters, metrics, and the model artifact for supported libraries with a single line of code.
One-Line Autologging
<pre><code class="language-python">mlflow.sklearn.autolog() # enable before fitting
with mlflow.start_run():
model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
# Params, CV scores, feature importances, and the model are all logged automatically</pre>