Deployment Guide for AI Applications

Best practices and step-by-step instructions for deploying AI applications to AWS, GCP, and Azure.

Deploying AI applications to production requires careful planning around infrastructure, scaling, security, and cost optimization. This guide provides platform-specific deployment strategies and best practices.

Pre-Deployment Checklist

Application Readiness

  • [ ] Environment Variables: All secrets externalized (no hardcoded keys)
  • [ ] Docker Containers: Application containerized and tested
  • [ ] Health Checks: /health endpoint implemented
  • [ ] Logging: Structured logging with appropriate levels
  • [ ] Metrics: Prometheus/OpenTelemetry metrics exposed
  • [ ] Configuration: 12-factor app principles followed

Infrastructure Planning

  • [ ] Cost Estimates: Monthly cloud costs projected
  • [ ] Scaling Strategy: Auto-scaling thresholds defined
  • [ ] Backup Plan: Disaster recovery procedure documented
  • [ ] Monitoring: Dashboards and alerts configured
  • [ ] Security: IAM roles, secrets, and network policies defined

AWS Deployment Guide

Architecture Overview

[Route 53 / CloudFront]

[Application Load Balancer]

[ECS Fargate Cluster]

[AI Services Container] [Redis] [RDS PostgreSQL] [Pinecone via VPC Endpoint]

Step 1: Container Registry (ECR)

# Create repository
aws ecr create-repository --repository-name kurai-ai-api

# Login to ECR
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin \
  <account-id>.dkr.ecr.us-east-1.amazonaws.com

# Build and push image
docker build -t kurai-ai-api .
docker tag kurai-ai-api:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/kurai-ai-api:latest
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/kurai-ai-api:latest

Step 2: ECS Fargate Deployment

Task Definition:

{
  "family": "kurai-ai-api",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "2048",
  "memory": "4096",
  "executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "ai-api",
      "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/kurai-ai-api:latest",
      "cpu": 2048,
      "memory": 4096,
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "ENVIRONMENT",
          "value": "production"
        },
        {
          "name": "REDIS_HOST",
          "value": "redis-cluster.xxxxxx.use1.cache.amazonaws.com"
        }
      ],
      "secrets": [
        {
          "name": "OPENAI_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:openai-key"
        },
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:database-url"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/kurai-ai-api",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}

Step 3: Auto-Scaling Configuration

# Create target
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/kurai-ai-cluster/kurai-ai-service \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 2 \
  --max-capacity 20

# Scale on CPU
aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --resource-id service/kurai-ai-cluster/kurai-ai-service \
  --scalable-dimension ecs:service:DesiredCount \
  --policy-name cpu-scale-policy \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration file://scaling-policy.json

scaling-policy.json:

{
  "TargetValue": 70.0,
  "PredefinedMetricSpecification": {
    "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
  },
  "ScaleOutCooldown": 300,
  "ScaleInCooldown": 300
}

Step 4: Infrastructure with Terraform

# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "kurai-ai-vpc"
  }
}

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "kurai-ai-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

# Application Load Balancer
resource "aws_lb" "main" {
  name               = "kurai-ai-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets           = aws_subnet.public[*].id

  enable_deletion_protection = false
}

# ECS Service
resource "aws_ecs_service" "ai_api" {
  name            = "kurai-ai-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.ai_api.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.ai_api.arn
    container_name   = "ai-api"
    container_port   = 8000
  }
}

GCP Deployment Guide

Architecture Overview

[Cloud Load Balancing]

[Cloud Run]

[AI Services Container] [Memorystore] [Cloud SQL] [Vector DB]

Step 1: Container Build & Push

# Build image
gcloud builds submit --tag gcr.io/<project-id>/kurai-ai-api

# Or use Cloud Build
gcloud builds submit --config cloudbuild.yaml

cloudbuild.yaml:

steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'gcr.io/$PROJECT_ID/kurai-ai-api:$BUILD_ID', '.']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$PROJECT_ID/kurai-ai-api:$BUILD_ID']

Step 2: Deploy to Cloud Run

# Deploy
gcloud run deploy kurai-ai-api \
  --image gcr.io/<project-id>/kurai-ai-api \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 4Gi \
  --cpu 2 \
  --min-instances 2 \
  --max-instances 100 \
  --timeout 300 \
  --concurrency 80 \
  --set-env-vars REDIS_HOST=redis-instance \
  --set-secrets OPENAI_API_KEY=openai-key:latest \
  --set-secrets DATABASE_URL=database-url:latest

Step 3: Cloud SQL Setup

# Create PostgreSQL instance
gcloud sql instances create kurai-ai-db \
  --database-version POSTGRES_15 \
  --tier db-custom-4-16384 \
  --region us-central1 \
  --storage-auto-increase

# Create database
gcloud sql databases create kurai_ai --instance=kurai-ai-db

# Connect via VPC connector (private IP)
gcloud sql instances patch kurai-ai-db \
  --connectivity-mode=private \
  --network=projects/<project-id>/global/networks/default \
  --allocated-ip-range=10.0.0.0/29

Step 4: Memorystore (Redis)

# Create Redis instance
gcloud redis instances create kurai-ai-cache \
  --size=5 \
  --region=us-central1 \
  --redis-version=redis_7_0 \
  --tier=STANDARD \
  --transit-mode=VPC_PEERING

Azure Deployment Guide

Architecture Overview

[Azure Front Door]

[Azure Container Instances]

[AI Services Container] [Azure Cache] [Azure Database] [Vector DB]

Step 1: Container Registry

# Create registry
az acr create --resource-group kurai-ai-rg \
  --name kuraiAIRegistry \
  --sku Standard

# Login
az acr login --name kuraiAIRegistry

# Build and push
az acr build --registry kuraiAIRegistry \
  --image kurai-ai-api:v1 .

Step 2: Container Instances

# Create container group
az container create \
  --resource-group kurai-ai-rg \
  --name kurai-ai-api \
  --image kuraiAIRegistry.azurecr.io/kurai-ai-api:v1 \
  --cpu 2 \
  --memory 4 \
  --registry-login-server kuraiAIRegistry.azurecr.io \
  --registry-username <username> \
  --registry-password <password> \
  --dns-name-label kurai-ai-api \
  --ports 8000 \
  --environment-variables \
    ENVIRONMENT=production \
    REDIS_HOST=redis-cache.redis.cache.windows.net \
  --secure-environment-variables \
    OPENAI_API_KEY=$OPENAI_API_KEY \
    DATABASE_URL=$DATABASE_URL

Step 3: Azure Database for PostgreSQL

# Create server
az postgres server create \
  --resource-group kurai-ai-rg \
  --name kurai-ai-db \
  --location eastus \
  --admin-user dbadmin \
  --admin-password <secure-password> \
  --sku-name GP_Gen5_4 \
  --version 13

# Configure VNET
az postgres server vnet-rule create \
  --resource-group kurai-ai-rg \
  --server-name kurai-ai-db \
  --subnet kurai-subnet

Deployment Best Practices

1. Blue-Green Deployment

Maintain two production environments:

  • Blue: Current production version
  • Green: New version to deploy

Benefits:

  • Zero-downtime deployments
  • Instant rollback capability
  • Easy A/B testing

2. Health Checks

Implement comprehensive health checks:

@app.get("/health")
async def health_check():
    checks = {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "checks": {
            "database": await check_db_connection(),
            "redis": await check_redis_connection(),
            "llm_provider": await check_llm_provider(),
            "disk_space": check_disk_usage()
        }
    }

    if all(checks["checks"].values()):
        return checks
    else:
        return JSONResponse(status_code=503, content=checks)

3. Secrets Management

AWS Secrets Manager:

# Store secret
aws secretsmanager create-secret \
  --name openai-api-key \
  --secret-string "sk-..."

# Rotate secret
aws secretsmanager rotate-secret \
  --secret-id openai-api-key \
  --rotation-lambda-arn arn:aws:lambda:...

Azure Key Vault:

# Store secret
az keyvault secret set \
  --vault-name kurai-key-vault \
  --name openai-api-key \
  --value "sk-..."

4. Logging Strategy

Centralized Logging with CloudWatch (AWS):

import boto3
import logging
from watchtower import CloudWatchLogHandler

logger = logging.getLogger(__name__)
logger.addHandler(CloudWatchLogHandler(
    log_group_name="/ecs/kurai-ai-api",
    stream_name="production"
))

logger.info("Processing query", extra={"query_id": "123", "user_id": "456"})

5. Cost Optimization

Rightsizing:

  • Start with 2 vCPUs, 4GB RAM per container
  • Scale based on actual usage metrics
  • Use spot instances for non-critical workloads

Cost Monitoring:

# AWS Cost Explorer
aws ce get-cost-and-usage \
  --time-period Start=2024-01-01,End=2024-01-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE

6. Disaster Recovery

Backup Strategy:

  • RDS Automated Backups: 7-35 day retention
  • Point-in-Time Recovery: Restore to any second within retention period
  • Cross-Region Replication: For critical applications

Recovery Testing:

  • Test disaster recovery procedures quarterly
  • Document RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
  • Target: RTO < 1 hour, RPO < 5 minutes

Related Articles:

  • Backend Architecture Checklist for AI Applications
  • Monitoring Best Practices for AI Systems
  • Cost Optimization Strategies for AI Projects