Building MLOps Pipelines - From Training to Production
How to automate ML model deployment, monitoring, and retraining with modern MLOps tools.
By Sarah Chen on
MLOps bridges the gap between ML development and production operations. It’s the practice of deploying, monitoring, and maintaining machine learning models at scale.
What is MLOps?
MLOps = ML + DevOps + Data Engineering
It focuses on:
- Reproducibility: Same code + data = same model
- Scalability: Deploy to millions of users
- Reliability: Models work consistently in production
- Automation: CI/CD for ML models
Core Components
1. Data Versioning
Why version data?
- Reproducible training
- Track data lineage
- Rollback capabilities
Tools:
- DVC: Data version control
- LakeFS: Git-like data versioning
- Delta Lake: ACID transactions on data lakes
# Track data versions with DVC
dvc add data/train.csv
git commit data/train.csv.dvc .gitignore
2. Model Registry
Store and version models:
from mlflow import log_model, log_params
# Train model
model = train_model(X_train, y_train)
# Log parameters and metrics
log_params({"learning_rate": 0.01, "epochs": 100})
log_metric("accuracy", 0.95)
# Log model
log_model(model, "sklearn-model")
Popular Tools:
- MLflow: Open-source, widely adopted
- Weights & Biases: Experiment tracking + model registry
- Hugging Face Hub: Model sharing for NLP
3. CI/CD for ML
Traditional CI/CD vs. ML CI/CD:
| Traditional | ML |
|---|---|
| Code changes | Code + data changes |
| Unit tests | Unit + data + model tests |
| Deploy artifacts | Deploy models + predictions |
| Rollback code | Rollback model + data |
Example ML Pipeline:
# .github/workflows/train.yml
name: Train Model
on:
push:
branches: [main]
jobs:
train:
runs-on: [gpu]
steps:
- checkout
- setup-python
- install-dependencies
- train-model:
script: python train.py
- evaluate-model:
script: python evaluate.py
- register-model:
script: python register.py
- deploy-if-better:
script: python deploy.py
4. Feature Store
Centralized feature management:
Benefits:
- Share features across models
- Prevent training-serving skew
- Enable online/offline feature consistency
Example:
from feast import FeatureStore
fs = FeatureStore("redis")
# Define features
user_features = [
Feature(name="avg_transaction_value", dtype=float),
Feature(name="account_age_days", dtype=int)
]
# Write training features
fs.write("user_features", df)
# Read for serving
features = fs.get_online_features(
features=["avg_transaction_value", "account_age_days"],
entity_rows=[{"user_id": 123}]
)
5. Model Monitoring
What to monitor:
1. Performance Metrics:
- Accuracy, precision, recall
- F1 score, AUC-ROC
- Custom business metrics
2. Data Drift:
- Feature distribution changes
- Prediction distribution changes
- New categories in categorical features
3. Model Decay:
- Performance degradation over time
- Trigger retraining when accuracy drops below threshold
Implementation:
from evidently import ColumnDriftMetric
drift_metric = ColumnDriftMetric()
# Calculate drift
drift_report = drift_metric.calculate(
reference_data=train_data,
current_data=production_data,
column_name="feature_1"
)
# Alert if significant drift
if drift_report.drift_score > 0.5:
alert_team("Data drift detected!")
Production Deployment Patterns
1. Batch Inference
Use case: Hourly/daily predictions
# Daily model scoring job
def score_users():
users = db.get_all_users()
features = extract_features(users)
predictions = model.predict(features)
db.save_predictions(predictions)
Tools:
- Airflow
- Prefect
- AWS Glue
2. Real-Time API
Use case: Low-latency predictions
from fastapi import FastAPI
import joblib
app = FastAPI()
model = joblib.load("model.pkl")
@app.post("/predict")
async def predict(features: dict):
prediction = model.predict([features])
return {"prediction": prediction[0]}
Tools:
- FastAPI
- Flask
- AWS SageMaker endpoints
3. Streaming Inference
Use case: High-throughput real-time
from kafka import KafkaConsumer
import ray
@ray.remote
def predict_batch(batch):
return model.predict(batch)
consumer = KafkaConsumer("predictions")
batch = []
for message in consumer:
batch.append(message.value)
if len(batch) >= 100:
# Process batch asynchronously
predict_batch.remote(batch)
batch = []
Tools:
- Kafka
- Kinesis
- Ray Serve
Automated Retraining
Trigger retraining when:
- Performance drops: Accuracy < threshold
- Data drift: Feature distribution changes
- Scheduled: Daily/weekly for stability
Pipeline:
def retrain_if_needed():
# Check performance metrics
current_accuracy = monitor.get_accuracy()
if current_accuracy < 0.85:
# Get latest training data
new_data = fetch_latest_data()
# Train new model
new_model = train(new_data)
# Validate
val_accuracy = validate(new_model)
# Deploy if better
if val_accuracy > current_accuracy:
deploy(new_model)
notify_team("Model deployed!")
Tools Stack
Open Source:
- MLflow: Experiment tracking + model registry
- Airflow/Prefect: Pipeline orchestration
- Prometheus + Grafana: Monitoring
- Feast: Feature store
- Seldon/Kserve: Model serving
Managed Services:
- AWS SageMaker: End-to-end ML platform
- Google Vertex AI: MLOps on GCP
- Azure ML: MLOps on Azure
- Databricks: Unified analytics + ML
Best Practices
1. Reproducibility
# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)
# Log everything
log_params({"model": "RandomForest", "n_estimators": 100})
log_artifact("preprocessor.pkl")
2. Experiment Tracking
import mlflow
with mlflow.start_run():
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("val_accuracy", 0.89)
mlflow.sklearn.log_model(model, "model")
3. Model Governance
- Approval workflow: Staging → Production
- Documentation: Model cards explain model behavior
- Audit trail: Who trained what, when, and why
Common Pitfalls
❌ Training-serving skew: Different feature distributions between training and production ✅ Use feature store to ensure consistency
❌ No monitoring: Silent model failures in production ✅ Implement comprehensive monitoring and alerting
❌ Manual deployments: Error-prone and slow ✅ Automate with CI/CD pipelines
❌ Poor version control: Can’t reproduce model behavior ✅ Use DVC + MLflow for data and model versioning
Production Checklist
✅ Model registry with versioning ✅ Automated deployment pipeline ✅ Monitoring dashboards ✅ Data drift detection ✅ Performance thresholds with alerts ✅ Rollback procedure ✅ Documentation (model cards, runbooks) ✅ Load testing completed ✅ Security scanning (model vulnerabilities) ✅ Disaster recovery plan
Conclusion
MLOps is essential for reliable ML in production. Start small: track experiments, automate deployment, add monitoring, then optimize incrementally. Good MLOps practices transform ML from prototypes into production systems.
We have a newsletter
Subscribe and get the latest news and updates about AI & Backend Development on your inbox every week. No spam, no hassle.