Building MLOps Pipelines - From Training to Production

How to automate ML model deployment, monitoring, and retraining with modern MLOps tools.

MLOps pipeline diagram

By Sarah Chen on

MLOps bridges the gap between ML development and production operations. It’s the practice of deploying, monitoring, and maintaining machine learning models at scale.

What is MLOps?

MLOps = ML + DevOps + Data Engineering

It focuses on:

  • Reproducibility: Same code + data = same model
  • Scalability: Deploy to millions of users
  • Reliability: Models work consistently in production
  • Automation: CI/CD for ML models

Core Components

1. Data Versioning

Why version data?

  • Reproducible training
  • Track data lineage
  • Rollback capabilities

Tools:

  • DVC: Data version control
  • LakeFS: Git-like data versioning
  • Delta Lake: ACID transactions on data lakes
# Track data versions with DVC
dvc add data/train.csv
git commit data/train.csv.dvc .gitignore

2. Model Registry

Store and version models:

from mlflow import log_model, log_params

# Train model
model = train_model(X_train, y_train)

# Log parameters and metrics
log_params({"learning_rate": 0.01, "epochs": 100})
log_metric("accuracy", 0.95)

# Log model
log_model(model, "sklearn-model")

Popular Tools:

  • MLflow: Open-source, widely adopted
  • Weights & Biases: Experiment tracking + model registry
  • Hugging Face Hub: Model sharing for NLP

3. CI/CD for ML

Traditional CI/CD vs. ML CI/CD:

TraditionalML
Code changesCode + data changes
Unit testsUnit + data + model tests
Deploy artifactsDeploy models + predictions
Rollback codeRollback model + data

Example ML Pipeline:

# .github/workflows/train.yml
name: Train Model

on:
  push:
    branches: [main]

jobs:
  train:
    runs-on: [gpu]
    steps:
      - checkout
      - setup-python
      - install-dependencies
      - train-model:
          script: python train.py
      - evaluate-model:
          script: python evaluate.py
      - register-model:
          script: python register.py
      - deploy-if-better:
          script: python deploy.py

4. Feature Store

Centralized feature management:

Benefits:

  • Share features across models
  • Prevent training-serving skew
  • Enable online/offline feature consistency

Example:

from feast import FeatureStore

fs = FeatureStore("redis")

# Define features
user_features = [
    Feature(name="avg_transaction_value", dtype=float),
    Feature(name="account_age_days", dtype=int)
]

# Write training features
fs.write("user_features", df)

# Read for serving
features = fs.get_online_features(
    features=["avg_transaction_value", "account_age_days"],
    entity_rows=[{"user_id": 123}]
)

5. Model Monitoring

What to monitor:

1. Performance Metrics:

  • Accuracy, precision, recall
  • F1 score, AUC-ROC
  • Custom business metrics

2. Data Drift:

  • Feature distribution changes
  • Prediction distribution changes
  • New categories in categorical features

3. Model Decay:

  • Performance degradation over time
  • Trigger retraining when accuracy drops below threshold

Implementation:

from evidently import ColumnDriftMetric

drift_metric = ColumnDriftMetric()

# Calculate drift
drift_report = drift_metric.calculate(
    reference_data=train_data,
    current_data=production_data,
    column_name="feature_1"
)

# Alert if significant drift
if drift_report.drift_score > 0.5:
    alert_team("Data drift detected!")

Production Deployment Patterns

1. Batch Inference

Use case: Hourly/daily predictions

# Daily model scoring job
def score_users():
    users = db.get_all_users()
    features = extract_features(users)
    predictions = model.predict(features)
    db.save_predictions(predictions)

Tools:

  • Airflow
  • Prefect
  • AWS Glue

2. Real-Time API

Use case: Low-latency predictions

from fastapi import FastAPI
import joblib

app = FastAPI()
model = joblib.load("model.pkl")

@app.post("/predict")
async def predict(features: dict):
    prediction = model.predict([features])
    return {"prediction": prediction[0]}

Tools:

  • FastAPI
  • Flask
  • AWS SageMaker endpoints

3. Streaming Inference

Use case: High-throughput real-time

from kafka import KafkaConsumer
import ray

@ray.remote
def predict_batch(batch):
    return model.predict(batch)

consumer = KafkaConsumer("predictions")
batch = []

for message in consumer:
    batch.append(message.value)
    if len(batch) >= 100:
        # Process batch asynchronously
        predict_batch.remote(batch)
        batch = []

Tools:

  • Kafka
  • Kinesis
  • Ray Serve

Automated Retraining

Trigger retraining when:

  1. Performance drops: Accuracy < threshold
  2. Data drift: Feature distribution changes
  3. Scheduled: Daily/weekly for stability

Pipeline:

def retrain_if_needed():
    # Check performance metrics
    current_accuracy = monitor.get_accuracy()

    if current_accuracy < 0.85:
        # Get latest training data
        new_data = fetch_latest_data()

        # Train new model
        new_model = train(new_data)

        # Validate
        val_accuracy = validate(new_model)

        # Deploy if better
        if val_accuracy > current_accuracy:
            deploy(new_model)
            notify_team("Model deployed!")

Tools Stack

Open Source:

  • MLflow: Experiment tracking + model registry
  • Airflow/Prefect: Pipeline orchestration
  • Prometheus + Grafana: Monitoring
  • Feast: Feature store
  • Seldon/Kserve: Model serving

Managed Services:

  • AWS SageMaker: End-to-end ML platform
  • Google Vertex AI: MLOps on GCP
  • Azure ML: MLOps on Azure
  • Databricks: Unified analytics + ML

Best Practices

1. Reproducibility

# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)

# Log everything
log_params({"model": "RandomForest", "n_estimators": 100})
log_artifact("preprocessor.pkl")

2. Experiment Tracking

import mlflow

with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("val_accuracy", 0.89)
    mlflow.sklearn.log_model(model, "model")

3. Model Governance

  • Approval workflow: Staging → Production
  • Documentation: Model cards explain model behavior
  • Audit trail: Who trained what, when, and why

Common Pitfalls

Training-serving skew: Different feature distributions between training and production ✅ Use feature store to ensure consistency

No monitoring: Silent model failures in production ✅ Implement comprehensive monitoring and alerting

Manual deployments: Error-prone and slow ✅ Automate with CI/CD pipelines

Poor version control: Can’t reproduce model behavior ✅ Use DVC + MLflow for data and model versioning

Production Checklist

✅ Model registry with versioning ✅ Automated deployment pipeline ✅ Monitoring dashboards ✅ Data drift detection ✅ Performance thresholds with alerts ✅ Rollback procedure ✅ Documentation (model cards, runbooks) ✅ Load testing completed ✅ Security scanning (model vulnerabilities) ✅ Disaster recovery plan

Conclusion

MLOps is essential for reliable ML in production. Start small: track experiments, automate deployment, add monitoring, then optimize incrementally. Good MLOps practices transform ML from prototypes into production systems.

We have a newsletter

Subscribe and get the latest news and updates about AI & Backend Development on your inbox every week. No spam, no hassle.