TeamFlow

How TeamFlow uses Kurai to power their newest projects.

Industry

SaaS & Project Management

Location

Denver, CO

Employees

120

Identity Provider

TeamFlow

Workloads

Microservices, Kubernetes, Kafka, AWS, PostgreSQL

About

TeamFlow is a project management SaaS platform serving 5,000+ companies with 250,000+ active users. Their platform includes task management, time tracking, team collaboration, and reporting features used by remote teams worldwide.

Challenge

TeamFlow's monolithic application was causing 45-minute deployment times, blocking other teams from releasing features. AWS costs were $85K/month with 70% database CPU utilization. Different services needed different scaling requirements.

Solution

Kurai executed a 6-month microservices migration using the strangler fig pattern. We decomposed the monolith into 12 services, implemented event-driven architecture with Kafka, and set up Kubernetes infrastructure. Result: 40% cost reduction and 8-minute deployments.

Results

40% reduction in AWS infrastructure costs ($85K → $51K/month)...

Deployment time reduced from 45 minutes to 8 minutes...

3x faster feature development per team...

99.95% uptime (from 99.7%)...

Monolith to Microservices: 40% Cost Reduction

The Problem

TeamFlow had grown fast—and their monolithic architecture was showing cracks:

Symptoms:

  • Deployments took 45 minutes (entire app rebuilt each time)
  • Database locks caused 5-10 minute outages weekly
  • Scaling the app meant scaling everything (even reporting that wasn’t used often)
  • Developer productivity: Teams waited days for other teams to finish features
  • AWS bill: $85K/month and climbing

CTO Mike Johnson: “Every deployment was a white-knuckle experience. One bug in reporting could take down the entire app. We couldn’t scale teams or infrastructure independently.”

The Solution

Kurai executed a 6-month microservices migration using the strangler fig pattern:

Phase 1: Identify Boundaries (Month 1)

# Domain-driven design to identify services
services = {
    "user_management": {
        "responsibility": "Auth, profiles, permissions",
        "database": "PostgreSQL",
        "apis": "/api/v1/users/*, /api/v1/auth/*"
    },
    "tasks": {
        "responsibility": "Task CRUD, assignments, due dates",
        "database": "PostgreSQL",
        "apis": "/api/v1/tasks/*"
    },
    "notifications": {
        "responsibility": "Email, push, in-app notifications",
        "database": "MongoDB (document store)",
        "apis": "/api/v1/notifications/*"
    },
    "reporting": {
        "responsibility": "Analytics, dashboards, exports",
        "database": "TimescaleDB (time-series)",
        "apis": "/api/v1/reports/*"
    }
}

Phase 2: Extract Services Incrementally (Months 2-5)

Extraction Order (Low Risk → High Risk):

  1. Notifications (Month 2) - Non-critical, read-heavy
  2. Reporting (Month 3) - Separate database, isolated workload
  3. Time Tracking (Month 4) - Moderate complexity
  4. Tasks (Month 5) - Core feature, high complexity

Migration Strategy for Each Service:

  1. Create API Gateway (Kong)

    • Route traffic to old monolith OR new service
    • Feature flags for gradual traffic shift
    • Observability from day one
  2. Implement Data Synchronization

    • Change Data Capture (CDC) with Debezium
    • Dual-write pattern during transition
    • Event bus (Kafka) for async communication
  3. Deploy Service (Kubernetes)

    # Kubernetes deployment
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: tasks-service
    spec:
      replicas: 3 # Scale independently
      resources:
        requests:
          cpu: "500m"
          memory: "512Mi"
        limits:
          cpu: "2000m"
          memory: "2Gi"
  4. Migrate Gradually

    • Week 1: Internal testing (1% traffic)
    • Week 2: Beta customers (5% traffic)
    • Week 3-4: 50% traffic split
    • Week 5: 100% to new service
    • Week 6: Remove old code from monolith

Phase 3: Database Decoupling (Month 5-6)

Approach: Database per Service + API Joins

  • Each service owns its database
  • Cross-service data via API calls or events
  • No distributed transactions (eventual consistency)

Infrastructure:

  • Orchestration: Kubernetes (EKS, 12 nodes, m5.2xlarge)
  • Service Mesh: Istio for traffic management and observability
  • Message Queue: Kafka (3 brokers, MSK)
  • API Gateway: Kong with rate limiting per service
  • Databases: 6x RDS PostgreSQL, 1x MongoDB Atlas, 1x TimescaleDB
  • Monitoring: Prometheus + Grafana + Loki
  • CI/CD: GitHub Actions + ArgoCD (GitOps)

The Results

Cost Impact:

  • Previous: $85K/month (monolithic over-provisioned)
  • New: $51K/month (right-sized services)
  • Savings: $408K/year (40% reduction)

Performance:

  • Deployment time: 45 min → 8 min (82% faster)
  • Uptime: 99.7% → 99.95% (4x fewer incidents)
  • Database CPU: 70% → 35% (better isolation)
  • P95 latency: 450ms → 180ms (60% faster)

Developer Productivity:

  • Feature release cycle: 2 weeks → 3 days
  • Team independence: 3x more parallel development
  • Onboarding time: 4 weeks → 2 weeks

Scalability:

  • Scale individual services:
    • Reporting service: 2 pods (rarely used)
    • Tasks service: 10 pods (heavy traffic)
    • Notifications service: 5 pods + auto-scaling
  • Handle traffic spikes without over-provisioning

Service-Level Performance

ServiceP95 LatencyThroughputUptime
User Management45ms2.5K req/s99.98%
Tasks120ms5.8K req/s99.95%
Notifications180ms1.2K req/s99.97%
Reporting350ms0.8K req/s99.92%
Time Tracking95ms3.4K req/s99.96%

Customer Feedback

“Our reporting features used to slow down the entire app. Now reporting runs on its own infrastructure and the core app is snappy even during heavy usage.” — Sarah Lee, Engineering Lead at TechCorp

What’s Next

Phase 3 roadmap:

  • Multi-region deployment: US-East + EU-West for global latency
  • Service-level authorization: Fine-grained permissions per microservice
  • Contract testing: Pact for consumer-driven contract testing
  • Chaos engineering: Gremlin for failure injection testing

Technology Stack

  • Orchestration: Kubernetes (EKS 1.27)
  • Service Mesh: Istio 1.18
  • API Gateway: Kong 3.3
  • Message Broker: Kafka (MSK, 3 brokers)
  • Databases: PostgreSQL 15, MongoDB 6, TimescaleDB 2.10
  • CI/CD: GitHub Actions + ArgoCD 2.6
  • Monitoring: Prometheus + Grafana + Tempo + Loki
  • Language: Python 3.11, Node.js 20
  • Framework: FastAPI 0.100, Express 4.18

Key Metrics

MetricBeforeAfterImprovement
Monthly AWS cost$85K$51K-40%
Deployment time45 min8 min-82%
Uptime99.7%99.95%+4x fewer incidents
P95 latency450ms180ms-60%
Feature cycle2 weeks3 days-79%

Architecture Changes

Before (Monolith):

[Load Balancer] → [Monolithic App] → [Single Database]

                (All services coupled)

After (Microservices):

[API Gateway] → [Service Mesh] → [User Service] → [PostgreSQL #1]
                            → [Tasks Service] → [PostgreSQL #2]
                            → [Notifications Service] → [MongoDB]
                            → [Reporting Service] → [TimescaleDB]
                            → [Kafka] ← [All Services]

Timeline

  • Month 1: Service boundaries, API gateway setup
  • Month 2: Notifications service extraction
  • Month 3: Reporting service extraction
  • Month 4: Time tracking service extraction
  • Month 5: Tasks service extraction (most complex)
  • Month 6: Optimization, monitoring, documentation

Lessons Learned

  1. Extract low-risk services first: Build confidence before touching core features
  2. Invest in observability early: Distributed tracing is non-negotiable
  3. Embrace eventual consistency: Distributed transactions will kill you
  4. API version from day one: Breaking changes are inevitable
  5. Service ownership matters: One team per service, no shared code

TeamFlow now scales teams and infrastructure independently. They deploy 15x per day per service vs. once per week before, and their AWS bill is 40% lower despite 3x user growth.

Trusted by Industry Leaders

Empowering innovators, shaping the future

David Gutierrez

David Gutierrez

CTO at TechFlow AI

"The RAG system they built for us reduced our support tickets by 60%. Their expertise in LLM integration is unmatched."

Pierluigi Camomillo

Pierluigi Camomillo

VP Engineering at DataScale

"They migrated our monolith to microservices seamlessly. We saw a 40% cost reduction and significantly improved scalability."

Ella Svensson

Ella Svensson

Founder at MediSort Health

"Their ML-powered patient triage system transformed our operations. 70% faster triage with 94% accuracy—sim incredible results."

Alexa Rios

Alexa Rios

Chief Product Officer at ShopMax

"The recommendation engine they built increased our AOV by 32%. Highly recommended for any e-commerce business looking to leverage AI!"