Kurai - AI & Backend Development Agency

MLOps bridges the gap between ML development and production operations. It’s the practice of deploying, monitoring, and maintaining machine learning models at scale.

What is MLOps?

MLOps = ML + DevOps + Data Engineering

It focuses on:

Reproducibility: Same code + data = same model
Scalability: Deploy to millions of users
Reliability: Models work consistently in production
Automation: CI/CD for ML models

Core Components

1. Data Versioning

Why version data?

Reproducible training
Track data lineage
Rollback capabilities

Tools:

DVC: Data version control
LakeFS: Git-like data versioning
Delta Lake: ACID transactions on data lakes

# Track data versions with DVC
dvc add data/train.csv
git commit data/train.csv.dvc .gitignore

2. Model Registry

Store and version models:

from mlflow import log_model, log_params

# Train model
model = train_model(X_train, y_train)

# Log parameters and metrics
log_params({"learning_rate": 0.01, "epochs": 100})
log_metric("accuracy", 0.95)

# Log model
log_model(model, "sklearn-model")

Popular Tools:

MLflow: Open-source, widely adopted
Weights & Biases: Experiment tracking + model registry
Hugging Face Hub: Model sharing for NLP

3. CI/CD for ML

Traditional CI/CD vs. ML CI/CD:

Traditional	ML
Code changes	Code + data changes
Unit tests	Unit + data + model tests
Deploy artifacts	Deploy models + predictions
Rollback code	Rollback model + data

Example ML Pipeline:

# .github/workflows/train.yml
name: Train Model

on:
  push:
    branches: [main]

jobs:
  train:
    runs-on: [gpu]
    steps:
      - checkout
      - setup-python
      - install-dependencies
      - train-model:
          script: python train.py
      - evaluate-model:
          script: python evaluate.py
      - register-model:
          script: python register.py
      - deploy-if-better:
          script: python deploy.py

4. Feature Store

Centralized feature management:

Benefits:

Share features across models
Prevent training-serving skew
Enable online/offline feature consistency

Example:

from feast import FeatureStore

fs = FeatureStore("redis")

# Define features
user_features = [
    Feature(name="avg_transaction_value", dtype=float),
    Feature(name="account_age_days", dtype=int)
]

# Write training features
fs.write("user_features", df)

# Read for serving
features = fs.get_online_features(
    features=["avg_transaction_value", "account_age_days"],
    entity_rows=[{"user_id": 123}]
)

5. Model Monitoring

What to monitor:

1. Performance Metrics:

Accuracy, precision, recall
F1 score, AUC-ROC
Custom business metrics

2. Data Drift:

Feature distribution changes
Prediction distribution changes
New categories in categorical features

3. Model Decay:

Performance degradation over time
Trigger retraining when accuracy drops below threshold

Implementation:

from evidently import ColumnDriftMetric

drift_metric = ColumnDriftMetric()

# Calculate drift
drift_report = drift_metric.calculate(
    reference_data=train_data,
    current_data=production_data,
    column_name="feature_1"
)

# Alert if significant drift
if drift_report.drift_score > 0.5:
    alert_team("Data drift detected!")

Production Deployment Patterns

1. Batch Inference

Use case: Hourly/daily predictions

# Daily model scoring job
def score_users():
    users = db.get_all_users()
    features = extract_features(users)
    predictions = model.predict(features)
    db.save_predictions(predictions)

Tools:

Airflow
Prefect
AWS Glue

2. Real-Time API

Use case: Low-latency predictions

from fastapi import FastAPI
import joblib

app = FastAPI()
model = joblib.load("model.pkl")

@app.post("/predict")
async def predict(features: dict):
    prediction = model.predict([features])
    return {"prediction": prediction[0]}

Tools:

FastAPI
Flask
AWS SageMaker endpoints

3. Streaming Inference

Use case: High-throughput real-time

from kafka import KafkaConsumer
import ray

@ray.remote
def predict_batch(batch):
    return model.predict(batch)

consumer = KafkaConsumer("predictions")
batch = []

for message in consumer:
    batch.append(message.value)
    if len(batch) >= 100:
        # Process batch asynchronously
        predict_batch.remote(batch)
        batch = []

Tools:

Kafka
Kinesis
Ray Serve

Automated Retraining

Trigger retraining when:

Performance drops: Accuracy < threshold
Data drift: Feature distribution changes
Scheduled: Daily/weekly for stability

Pipeline:

def retrain_if_needed():
    # Check performance metrics
    current_accuracy = monitor.get_accuracy()

    if current_accuracy < 0.85:
        # Get latest training data
        new_data = fetch_latest_data()

        # Train new model
        new_model = train(new_data)

        # Validate
        val_accuracy = validate(new_model)

        # Deploy if better
        if val_accuracy > current_accuracy:
            deploy(new_model)
            notify_team("Model deployed!")

Tools Stack

Open Source:

MLflow: Experiment tracking + model registry
Airflow/Prefect: Pipeline orchestration
Prometheus + Grafana: Monitoring
Feast: Feature store
Seldon/Kserve: Model serving

Managed Services:

AWS SageMaker: End-to-end ML platform
Google Vertex AI: MLOps on GCP
Azure ML: MLOps on Azure
Databricks: Unified analytics + ML

Best Practices

1. Reproducibility

# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)

# Log everything
log_params({"model": "RandomForest", "n_estimators": 100})
log_artifact("preprocessor.pkl")

2. Experiment Tracking

import mlflow

with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("val_accuracy", 0.89)
    mlflow.sklearn.log_model(model, "model")

3. Model Governance

Approval workflow: Staging → Production
Documentation: Model cards explain model behavior
Audit trail: Who trained what, when, and why

Common Pitfalls

❌ Training-serving skew: Different feature distributions between training and production ✅ Use feature store to ensure consistency

❌ No monitoring: Silent model failures in production ✅ Implement comprehensive monitoring and alerting

❌ Manual deployments: Error-prone and slow ✅ Automate with CI/CD pipelines

❌ Poor version control: Can’t reproduce model behavior ✅ Use DVC + MLflow for data and model versioning

Production Checklist

✅ Model registry with versioning ✅ Automated deployment pipeline ✅ Monitoring dashboards ✅ Data drift detection ✅ Performance thresholds with alerts ✅ Rollback procedure ✅ Documentation (model cards, runbooks) ✅ Load testing completed ✅ Security scanning (model vulnerabilities) ✅ Disaster recovery plan

Conclusion

MLOps is essential for reliable ML in production. Start small: track experiments, automate deployment, add monitoring, then optimize incrementally. Good MLOps practices transform ML from prototypes into production systems.

Building MLOps Pipelines - From Training to Production

What is MLOps?

Core Components

1. Data Versioning

2. Model Registry

3. CI/CD for ML

4. Feature Store

5. Model Monitoring

Production Deployment Patterns

1. Batch Inference

2. Real-Time API

3. Streaming Inference

Automated Retraining

Tools Stack

Best Practices

1. Reproducibility

2. Experiment Tracking

3. Model Governance

Common Pitfalls

Production Checklist

Conclusion

We have a newsletter