Unfil AI

How Unfil AI uses Kurai to power their newest projects.

Industry

Developer Tools & AI Infrastructure

Location

Jakarta, Indonesia

Employees

15

Project Size

$22K

Identity Provider

Custom

Workloads

Kubernetes, Edge Computing, Microservices, Event Streaming

About

Unfil AI is an API provider delivering production-ready RAG and LLM infrastructure for developers. Their platform processes 500M+ API requests monthly and serves 10,000+ customers across the AI/ML ecosystem.

Challenge

Unfil AI's API infrastructure was struggling with scaling issues. P95 latency had degraded to 3.2 seconds, error rates spiked to 4.7% during peak loads, and customers were experiencing rate limiting. Their monolithic architecture couldn't handle the 10x traffic growth from generative AI adoption.

Solution

Kurai rebuilt Unfil AI's API platform as a distributed event-driven system. We implemented microservices with Kubernetes, added multi-region edge caching with Cloudflare, optimized database queries with connection pooling, and built an auto-scaling pipeline that handles traffic spikes gracefully.

Results

API latency improved from 3.2s to 180ms P95...

Error rate reduced from 4.7% to 0.02%...

Infrastructure costs reduced by 65%...

Capacity increased to handle 1B+ requests/month...

API Infrastructure at Scale: Building for Billions of Requests

The Problem

Unfil AI was exploding—but their infrastructure wasn’t keeping up. The generative AI boom sent API requests skyrocketing from 50M to 500M per month. Their monolithic architecture was buckling under the load.

CTO Sarah Kim said: “We were growing 10x quarter-over-quarter, but our API latency was degrading just as fast. Customers were abandoning us for competitors with faster response times.”

The Pain Points:

  • P95 latency at 3.2 seconds (unusable for real-time applications)
  • 4.7% error rate during peak hours
  • Frequent rate limiting angering developers
  • Single region deployment causing 600ms round-trips for EU/Asia customers
  • Database connection exhaustion during spikes

The Solution

Kurai completely reimagined Unfil AI’s infrastructure as a globally distributed, event-driven platform:

Architecture Transformation:

Before:

[Load Balancer] → [Monolithic API] → [Single DB]

                 [Redis Cache]

After:

[Edge Nodes] → [API Gateway] → [Microservices] → [Event Bus]
                      ↓              ↓              ↓
                 [CDN Cache]    [Read Replicas]   [Worker Queue]

                                 [Write Master]

Key Components:

  1. Microservices Split:

    • Authentication service
    • Embedding generation service
    • Vector search service
    • Rate limiting service
    • Analytics service
  2. Edge Computing:

    • Cloudflare Workers in 300+ locations
    • Cache 80% of read requests at edge
    • DDoS protection and bot filtering
  3. Database Optimization:

    • PostgreSQL read replicas (1 master, 5 replicas)
    • PgBouncer connection pooling (10K connections)
    • Partitioned tables by customer_id
    • Optimized indexes reduced query time by 85%
  4. Event-Driven Architecture:

    • Apache Kafka for async processing
    • 50 partitions for parallelism
    • Dead letter queues for failed events
    • Exactly-once semantics
  5. Auto-Scaling:

    • Kubernetes Horizontal Pod Autoscaler
    • Scale based on requests per second (RPS)
    • Scale up: 30 seconds
    • Scale down: 5 minutes
    • Min 20 pods, max 500 pods

The Results

Performance Improvements:

MetricBeforeAfterImprovement
P95 Latency3,200ms180ms94% faster
P99 Latency8,500ms420ms95% faster
Error Rate4.7%0.02%99.6% reduction
Throughput200 req/sec5,000 req/sec25x capacity
Uptime99.5%99.99%+7x reliability

Cost Savings:

  • Before: $120K/month (AWS over-provisioned)
  • After: $42K/month (auto-scaling + spot instances)
  • Savings: $78K/month (65% reduction)

Geographic Reach:

  • Before: Single region (us-east-1)
  • After: 5 regions + edge caching
  • Global latency: 600ms → 80ms average

Developer Experience:

“Unfil AI’s API went from a bottleneck to our fastest dependency. Integration took 10 minutes, and we haven’t seen a single timeout in 3 months.” — Alex Rivera, Lead Engineer at PromptCraft

What’s Next

Phase 2 initiatives:

  • GraphQL API for flexible queries
  • WebSocket support for real-time streaming
  • SDKs for Python, JavaScript, Go, and Rust
  • API analytics dashboard for customers
  • Custom fine-tuning endpoints

Technology Stack

  • Runtime: Kubernetes (EKS) + Docker
  • API Gateway: Kong Enterprise
  • Service Mesh: Istio
  • Message Queue: Apache Kafka (Confluent Cloud)
  • Database: PostgreSQL 15 (Amazon RDS)
  • Cache: Redis Cluster (ElastiCache)
  • Edge: Cloudflare Workers + KV
  • Monitoring: Datadog + OpenTelemetry
  • CI/CD: GitHub Actions + ArgoCD

Timeline

  • Month 1: Microservices design and Kafka setup
  • Month 2: Database migration and read replicas
  • Month 3: Kubernetes deployment and auto-scaling
  • Month 4: Edge caching and CDN integration
  • Month 5: Load testing and optimization
  • Month 6: Gradual traffic rollout (10% → 100%)

Lessons Learned

  1. Measure everything: We traced 100M requests to find 3 critical bottlenecks
  2. Cache aggressively: Edge caching reduced origin load by 80%
  3. Embrace async: Synchronous processing doesn’t scale; Kafka handles bursts gracefully
  4. Test at scale: Load testing with production traffic patterns caught 12 issues before launch
  5. Monitor costs: Spot instances saved 60% on compute with minimal impact

Unfil AI now processes 1B+ API requests monthly with sub-200ms latency globally. Their platform has become the go-to choice for developers building AI-powered applications.