Saving Models with Pickle and Joblib
Once a model is trained, serializing it to disk allows you to reload it instantly for inference without retraining — the two standard tools in Python are pickle (built-in) and joblib (preferred for large NumPy arrays).
Saving and Loading with Pickle
pickle serializes any Python object to a byte stream. It's universally available but can be slow for models with large internal NumPy arrays.
Pickle Workflow
<pre><code class="language-python">import pickle
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# Save
with open("rf_model.pkl", "wb") as f:
pickle.dump(model, f)
# Load
with open("rf_model.pkl", "rb") as f:
loaded_model = pickle.load(f)
print(loaded_model.predict(X[:3]))</pre>
Saving and Loading with Joblib
joblib uses memory-mapped files and compression to efficiently serialize objects containing large NumPy arrays — it is the scikit-learn team's recommended approach.
Joblib Workflow with Compression
<pre><code class="language-python">import joblib
# Save (compress=3 is a good balance of speed vs. file size)
joblib.dump(model, "rf_model.joblib", compress=3)
# Load
loaded_model = joblib.load("rf_model.joblib")
print(loaded_model.predict(X[:3]))
# Check file size
import os
print(f"File size: {os.path.getsize('rf_model.joblib') / 1024:.1f} KB")</pre>
Versioning Saved Models
<pre><code class="language-python">import datetime
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"rf_model_v1_{timestamp}.joblib"
joblib.dump(model, filename)
print(f"Model saved as: {filename}")</pre>
Security and Compatibility Considerations
Pickle files execute arbitrary code on load — never unpickle untrusted files. Additionally, models are not guaranteed to load correctly across different scikit-learn or Python versions.
Best Practices
- Always record the scikit-learn version alongside the saved file (
sklearn.__version__) - Use environment lock files (
requirements.txtorconda env export) to reproduce the exact environment - For production or cross-language serving, consider ONNX or PMML export instead of pickle
- Store model files in versioned storage (S3, GCS, DVC) rather than local disk