Model Versioning with MLflow

MLflow is an open-source platform that tracks experiments, packages reproducible runs, and provides a model registry to version, stage, and serve ML models throughout their lifecycle.

Tracking Experiments

Every MLflow run records parameters, metrics, and artifacts to a central tracking server (local file system or remote URI), making experiments fully reproducible and comparable.

Logging a Scikit-Learn Run

<pre><code class="language-python">import mlflow import mlflow.sklearn from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, f1_score X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) mlflow.set_experiment("iris-classification") with mlflow.start_run(run_name="rf-baseline"): n_est, depth = 100, 5 model = RandomForestClassifier(n_estimators=n_est, max_depth=depth, random_state=42) model.fit(X_train, y_train) preds = model.predict(X_test) # Log hyperparameters mlflow.log_param("n_estimators", n_est) mlflow.log_param("max_depth", depth) # Log metrics mlflow.log_metric("accuracy", accuracy_score(y_test, preds)) mlflow.log_metric("f1_macro", f1_score(y_test, preds, average="macro")) # Log model as artifact mlflow.sklearn.log_model(model, artifact_path="model") print("Run ID:", mlflow.active_run().info.run_id)</pre>

Launching the MLflow UI

<pre><code class="language-python"># Run in terminal: # mlflow ui # Then open: http://localhost:5000 # All experiments, runs, params, metrics, and artifacts are visible # You can compare runs side-by-side and download artifacts</pre>

The MLflow Model Registry

The Model Registry provides a centralised hub for managing model versions with lifecycle stages: Staging, Production, and Archived.

Registering and Transitioning Models

<pre><code class="language-python">from mlflow.tracking import MlflowClient client = MlflowClient() # Register a logged model run_id = "<your-run-id>" model_uri = f"runs:/{run_id}/model" registered = mlflow.register_model(model_uri, name="IrisClassifier") print("Version:", registered.version) # Transition to Staging client.transition_model_version_stage( name="IrisClassifier", version=registered.version, stage="Staging" ) # Load the production model anywhere production_model = mlflow.sklearn.load_model("models:/IrisClassifier/Production")</pre>

Autologging

MLflow's autolog feature automatically captures parameters, metrics, and the model artifact for supported libraries with a single line of code.

One-Line Autologging

<pre><code class="language-python">mlflow.sklearn.autolog() # enable before fitting with mlflow.start_run(): model = RandomForestClassifier(n_estimators=200, random_state=42) model.fit(X_train, y_train) # Params, CV scores, feature importances, and the model are all logged automatically</pre>