Deployment Guide for AI Applications
Best practices and step-by-step instructions for deploying AI applications to AWS, GCP, and Azure.Deploying AI applications to production requires careful planning around infrastructure, scaling, security, and cost optimization. This guide provides platform-specific deployment strategies and best practices.
Pre-Deployment Checklist
Application Readiness
- [ ] Environment Variables: All secrets externalized (no hardcoded keys)
- [ ] Docker Containers: Application containerized and tested
- [ ] Health Checks:
/healthendpoint implemented - [ ] Logging: Structured logging with appropriate levels
- [ ] Metrics: Prometheus/OpenTelemetry metrics exposed
- [ ] Configuration: 12-factor app principles followed
Infrastructure Planning
- [ ] Cost Estimates: Monthly cloud costs projected
- [ ] Scaling Strategy: Auto-scaling thresholds defined
- [ ] Backup Plan: Disaster recovery procedure documented
- [ ] Monitoring: Dashboards and alerts configured
- [ ] Security: IAM roles, secrets, and network policies defined
AWS Deployment Guide
Architecture Overview
[Route 53 / CloudFront]
↓
[Application Load Balancer]
↓
[ECS Fargate Cluster]
↓
[AI Services Container] [Redis] [RDS PostgreSQL] [Pinecone via VPC Endpoint]
Step 1: Container Registry (ECR)
# Create repository
aws ecr create-repository --repository-name kurai-ai-api
# Login to ECR
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
<account-id>.dkr.ecr.us-east-1.amazonaws.com
# Build and push image
docker build -t kurai-ai-api .
docker tag kurai-ai-api:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/kurai-ai-api:latest
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/kurai-ai-api:latest
Step 2: ECS Fargate Deployment
Task Definition:
{
"family": "kurai-ai-api",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "2048",
"memory": "4096",
"executionRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::<account-id>:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "ai-api",
"image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/kurai-ai-api:latest",
"cpu": 2048,
"memory": 4096,
"essential": true,
"portMappings": [
{
"containerPort": 8000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "ENVIRONMENT",
"value": "production"
},
{
"name": "REDIS_HOST",
"value": "redis-cluster.xxxxxx.use1.cache.amazonaws.com"
}
],
"secrets": [
{
"name": "OPENAI_API_KEY",
"valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:openai-key"
},
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:database-url"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/kurai-ai-api",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
]
}
Step 3: Auto-Scaling Configuration
# Create target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/kurai-ai-cluster/kurai-ai-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 2 \
--max-capacity 20
# Scale on CPU
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/kurai-ai-cluster/kurai-ai-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-scale-policy \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://scaling-policy.json
scaling-policy.json:
{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 300,
"ScaleInCooldown": 300
}
Step 4: Infrastructure with Terraform
# VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "kurai-ai-vpc"
}
}
# ECS Cluster
resource "aws_ecs_cluster" "main" {
name = "kurai-ai-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
}
# Application Load Balancer
resource "aws_lb" "main" {
name = "kurai-ai-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = aws_subnet.public[*].id
enable_deletion_protection = false
}
# ECS Service
resource "aws_ecs_service" "ai_api" {
name = "kurai-ai-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.ai_api.arn
desired_count = 3
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.ecs_tasks.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.ai_api.arn
container_name = "ai-api"
container_port = 8000
}
}
GCP Deployment Guide
Architecture Overview
[Cloud Load Balancing]
↓
[Cloud Run]
↓
[AI Services Container] [Memorystore] [Cloud SQL] [Vector DB]
Step 1: Container Build & Push
# Build image
gcloud builds submit --tag gcr.io/<project-id>/kurai-ai-api
# Or use Cloud Build
gcloud builds submit --config cloudbuild.yaml
cloudbuild.yaml:
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/kurai-ai-api:$BUILD_ID', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/kurai-ai-api:$BUILD_ID']
Step 2: Deploy to Cloud Run
# Deploy
gcloud run deploy kurai-ai-api \
--image gcr.io/<project-id>/kurai-ai-api \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--memory 4Gi \
--cpu 2 \
--min-instances 2 \
--max-instances 100 \
--timeout 300 \
--concurrency 80 \
--set-env-vars REDIS_HOST=redis-instance \
--set-secrets OPENAI_API_KEY=openai-key:latest \
--set-secrets DATABASE_URL=database-url:latest
Step 3: Cloud SQL Setup
# Create PostgreSQL instance
gcloud sql instances create kurai-ai-db \
--database-version POSTGRES_15 \
--tier db-custom-4-16384 \
--region us-central1 \
--storage-auto-increase
# Create database
gcloud sql databases create kurai_ai --instance=kurai-ai-db
# Connect via VPC connector (private IP)
gcloud sql instances patch kurai-ai-db \
--connectivity-mode=private \
--network=projects/<project-id>/global/networks/default \
--allocated-ip-range=10.0.0.0/29
Step 4: Memorystore (Redis)
# Create Redis instance
gcloud redis instances create kurai-ai-cache \
--size=5 \
--region=us-central1 \
--redis-version=redis_7_0 \
--tier=STANDARD \
--transit-mode=VPC_PEERING
Azure Deployment Guide
Architecture Overview
[Azure Front Door]
↓
[Azure Container Instances]
↓
[AI Services Container] [Azure Cache] [Azure Database] [Vector DB]
Step 1: Container Registry
# Create registry
az acr create --resource-group kurai-ai-rg \
--name kuraiAIRegistry \
--sku Standard
# Login
az acr login --name kuraiAIRegistry
# Build and push
az acr build --registry kuraiAIRegistry \
--image kurai-ai-api:v1 .
Step 2: Container Instances
# Create container group
az container create \
--resource-group kurai-ai-rg \
--name kurai-ai-api \
--image kuraiAIRegistry.azurecr.io/kurai-ai-api:v1 \
--cpu 2 \
--memory 4 \
--registry-login-server kuraiAIRegistry.azurecr.io \
--registry-username <username> \
--registry-password <password> \
--dns-name-label kurai-ai-api \
--ports 8000 \
--environment-variables \
ENVIRONMENT=production \
REDIS_HOST=redis-cache.redis.cache.windows.net \
--secure-environment-variables \
OPENAI_API_KEY=$OPENAI_API_KEY \
DATABASE_URL=$DATABASE_URL
Step 3: Azure Database for PostgreSQL
# Create server
az postgres server create \
--resource-group kurai-ai-rg \
--name kurai-ai-db \
--location eastus \
--admin-user dbadmin \
--admin-password <secure-password> \
--sku-name GP_Gen5_4 \
--version 13
# Configure VNET
az postgres server vnet-rule create \
--resource-group kurai-ai-rg \
--server-name kurai-ai-db \
--subnet kurai-subnet
Deployment Best Practices
1. Blue-Green Deployment
Maintain two production environments:
- Blue: Current production version
- Green: New version to deploy
Benefits:
- Zero-downtime deployments
- Instant rollback capability
- Easy A/B testing
2. Health Checks
Implement comprehensive health checks:
@app.get("/health")
async def health_check():
checks = {
"status": "healthy",
"timestamp": datetime.utcnow().isoformat(),
"checks": {
"database": await check_db_connection(),
"redis": await check_redis_connection(),
"llm_provider": await check_llm_provider(),
"disk_space": check_disk_usage()
}
}
if all(checks["checks"].values()):
return checks
else:
return JSONResponse(status_code=503, content=checks)
3. Secrets Management
AWS Secrets Manager:
# Store secret
aws secretsmanager create-secret \
--name openai-api-key \
--secret-string "sk-..."
# Rotate secret
aws secretsmanager rotate-secret \
--secret-id openai-api-key \
--rotation-lambda-arn arn:aws:lambda:...
Azure Key Vault:
# Store secret
az keyvault secret set \
--vault-name kurai-key-vault \
--name openai-api-key \
--value "sk-..."
4. Logging Strategy
Centralized Logging with CloudWatch (AWS):
import boto3
import logging
from watchtower import CloudWatchLogHandler
logger = logging.getLogger(__name__)
logger.addHandler(CloudWatchLogHandler(
log_group_name="/ecs/kurai-ai-api",
stream_name="production"
))
logger.info("Processing query", extra={"query_id": "123", "user_id": "456"})
5. Cost Optimization
Rightsizing:
- Start with 2 vCPUs, 4GB RAM per container
- Scale based on actual usage metrics
- Use spot instances for non-critical workloads
Cost Monitoring:
# AWS Cost Explorer
aws ce get-cost-and-usage \
--time-period Start=2024-01-01,End=2024-01-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE
6. Disaster Recovery
Backup Strategy:
- RDS Automated Backups: 7-35 day retention
- Point-in-Time Recovery: Restore to any second within retention period
- Cross-Region Replication: For critical applications
Recovery Testing:
- Test disaster recovery procedures quarterly
- Document RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
- Target: RTO < 1 hour, RPO < 5 minutes
Related Articles:
- Backend Architecture Checklist for AI Applications
- Monitoring Best Practices for AI Systems
- Cost Optimization Strategies for AI Projects