TeamFlow

How TeamFlow uses Kurai to power their newest projects.

Industry

SaaS & Project Management

Location

Denver, CO

Employees

120

Identity Provider

TeamFlow

Workloads

Microservices, Kubernetes, Kafka, AWS, PostgreSQL

About: TeamFlow is a project management SaaS platform serving 5,000+ companies with 250,000+ active users. Their platform includes task management, time tracking, team collaboration, and reporting features used by remote teams worldwide.
Challenge: TeamFlow's monolithic application was causing 45-minute deployment times, blocking other teams from releasing features. AWS costs were $85K/month with 70% database CPU utilization. Different services needed different scaling requirements.
Solution: Kurai executed a 6-month microservices migration using the strangler fig pattern. We decomposed the monolith into 12 services, implemented event-driven architecture with Kafka, and set up Kubernetes infrastructure. Result: 40% cost reduction and 8-minute deployments.

Results

40% reduction in AWS infrastructure costs ($85K → $51K/month)...

Deployment time reduced from 45 minutes to 8 minutes...

3x faster feature development per team...

99.95% uptime (from 99.7%)...

Monolith to Microservices: 40% Cost Reduction

The Problem

TeamFlow had grown fast—and their monolithic architecture was showing cracks:

Symptoms:

Deployments took 45 minutes (entire app rebuilt each time)
Database locks caused 5-10 minute outages weekly
Scaling the app meant scaling everything (even reporting that wasn’t used often)
Developer productivity: Teams waited days for other teams to finish features
AWS bill: $85K/month and climbing

CTO Mike Johnson: “Every deployment was a white-knuckle experience. One bug in reporting could take down the entire app. We couldn’t scale teams or infrastructure independently.”

The Solution

Kurai executed a 6-month microservices migration using the strangler fig pattern:

Phase 1: Identify Boundaries (Month 1)

# Domain-driven design to identify services
services = {
    "user_management": {
        "responsibility": "Auth, profiles, permissions",
        "database": "PostgreSQL",
        "apis": "/api/v1/users/*, /api/v1/auth/*"
    },
    "tasks": {
        "responsibility": "Task CRUD, assignments, due dates",
        "database": "PostgreSQL",
        "apis": "/api/v1/tasks/*"
    },
    "notifications": {
        "responsibility": "Email, push, in-app notifications",
        "database": "MongoDB (document store)",
        "apis": "/api/v1/notifications/*"
    },
    "reporting": {
        "responsibility": "Analytics, dashboards, exports",
        "database": "TimescaleDB (time-series)",
        "apis": "/api/v1/reports/*"
    }
}

Phase 2: Extract Services Incrementally (Months 2-5)

Extraction Order (Low Risk → High Risk):

Notifications (Month 2) - Non-critical, read-heavy
Reporting (Month 3) - Separate database, isolated workload
Time Tracking (Month 4) - Moderate complexity
Tasks (Month 5) - Core feature, high complexity

Migration Strategy for Each Service:

Create API Gateway (Kong)
- Route traffic to old monolith OR new service
- Feature flags for gradual traffic shift
- Observability from day one
Implement Data Synchronization
- Change Data Capture (CDC) with Debezium
- Dual-write pattern during transition
- Event bus (Kafka) for async communication

Deploy Service (Kubernetes)

# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tasks-service
spec:
  replicas: 3 # Scale independently
  resources:
    requests:
      cpu: "500m"
      memory: "512Mi"
    limits:
      cpu: "2000m"
      memory: "2Gi"

Migrate Gradually
- Week 1: Internal testing (1% traffic)
- Week 2: Beta customers (5% traffic)
- Week 3-4: 50% traffic split
- Week 5: 100% to new service
- Week 6: Remove old code from monolith

Phase 3: Database Decoupling (Month 5-6)

Approach: Database per Service + API Joins

Each service owns its database
Cross-service data via API calls or events
No distributed transactions (eventual consistency)

Infrastructure:

Orchestration: Kubernetes (EKS, 12 nodes, m5.2xlarge)
Service Mesh: Istio for traffic management and observability
Message Queue: Kafka (3 brokers, MSK)
API Gateway: Kong with rate limiting per service
Databases: 6x RDS PostgreSQL, 1x MongoDB Atlas, 1x TimescaleDB
Monitoring: Prometheus + Grafana + Loki
CI/CD: GitHub Actions + ArgoCD (GitOps)

The Results

Cost Impact:

Previous: $85K/month (monolithic over-provisioned)
New: $51K/month (right-sized services)
Savings: $408K/year (40% reduction)

Performance:

Deployment time: 45 min → 8 min (82% faster)
Uptime: 99.7% → 99.95% (4x fewer incidents)
Database CPU: 70% → 35% (better isolation)
P95 latency: 450ms → 180ms (60% faster)

Developer Productivity:

Feature release cycle: 2 weeks → 3 days
Team independence: 3x more parallel development
Onboarding time: 4 weeks → 2 weeks

Scalability:

Scale individual services:
- Reporting service: 2 pods (rarely used)
- Tasks service: 10 pods (heavy traffic)
- Notifications service: 5 pods + auto-scaling
Handle traffic spikes without over-provisioning

Service-Level Performance

Service	P95 Latency	Throughput	Uptime
User Management	45ms	2.5K req/s	99.98%
Tasks	120ms	5.8K req/s	99.95%
Notifications	180ms	1.2K req/s	99.97%
Reporting	350ms	0.8K req/s	99.92%
Time Tracking	95ms	3.4K req/s	99.96%

Customer Feedback

“Our reporting features used to slow down the entire app. Now reporting runs on its own infrastructure and the core app is snappy even during heavy usage.” — Sarah Lee, Engineering Lead at TechCorp

What’s Next

Phase 3 roadmap:

Multi-region deployment: US-East + EU-West for global latency
Service-level authorization: Fine-grained permissions per microservice
Contract testing: Pact for consumer-driven contract testing
Chaos engineering: Gremlin for failure injection testing

Technology Stack

Orchestration: Kubernetes (EKS 1.27)
Service Mesh: Istio 1.18
API Gateway: Kong 3.3
Message Broker: Kafka (MSK, 3 brokers)
Databases: PostgreSQL 15, MongoDB 6, TimescaleDB 2.10
CI/CD: GitHub Actions + ArgoCD 2.6
Monitoring: Prometheus + Grafana + Tempo + Loki
Language: Python 3.11, Node.js 20
Framework: FastAPI 0.100, Express 4.18

Key Metrics

Metric	Before	After	Improvement
Monthly AWS cost	$85K	$51K	-40%
Deployment time	45 min	8 min	-82%
Uptime	99.7%	99.95%	+4x fewer incidents
P95 latency	450ms	180ms	-60%
Feature cycle	2 weeks	3 days	-79%

Architecture Changes

Before (Monolith):

[Load Balancer] → [Monolithic App] → [Single Database]
                      ↓
                (All services coupled)

After (Microservices):

[API Gateway] → [Service Mesh] → [User Service] → [PostgreSQL #1]
                            → [Tasks Service] → [PostgreSQL #2]
                            → [Notifications Service] → [MongoDB]
                            → [Reporting Service] → [TimescaleDB]
                            → [Kafka] ← [All Services]

Timeline

Month 1: Service boundaries, API gateway setup
Month 2: Notifications service extraction
Month 3: Reporting service extraction
Month 4: Time tracking service extraction
Month 5: Tasks service extraction (most complex)
Month 6: Optimization, monitoring, documentation

Lessons Learned

Extract low-risk services first: Build confidence before touching core features
Invest in observability early: Distributed tracing is non-negotiable
Embrace eventual consistency: Distributed transactions will kill you
API version from day one: Breaking changes are inevitable
Service ownership matters: One team per service, no shared code

TeamFlow now scales teams and infrastructure independently. They deploy 15x per day per service vs. once per week before, and their AWS bill is 40% lower despite 3x user growth.

Trusted by Industry Leaders

Empowering innovators, shaping the future

David Gutierrez

CTO at TechFlow AI

"The RAG system they built for us reduced our support tickets by 60%. Their expertise in LLM integration is unmatched."

Pierluigi Camomillo

VP Engineering at DataScale

"They migrated our monolith to microservices seamlessly. We saw a 40% cost reduction and significantly improved scalability."

Ella Svensson

Founder at MediSort Health

"Their ML-powered patient triage system transformed our operations. 70% faster triage with 94% accuracy—sim incredible results."

Alexa Rios

Chief Product Officer at ShopMax

"The recommendation engine they built increased our AOV by 32%. Highly recommended for any e-commerce business looking to leverage AI!"

TeamFlow

Industry

Location

Employees

Identity Provider

Workloads

About

Challenge

Solution

Results

Monolith to Microservices: 40% Cost Reduction

Empowering innovators, shaping the future