Kurai - AI & Backend Development Agency

Vector databases are the backbone of modern RAG systems. They store and search high-dimensional embeddings, enabling AI applications to find relevant context in milliseconds. But not all vector databases are created equal.

What Are Vector Databases?

Vector databases specialize in:

Approximate Nearest Neighbor (ANN) search: Find similar vectors fast
High-dimensional indexing: Efficiently search 1000+ dimensional vectors
Metadata filtering: Combine similarity search with structured queries
Scalability: Handle millions to billions of vectors

Traditional databases can’t handle:

Cosine similarity search at scale
Real-time embedding updates
Sub-100ms query latency on large datasets

Top Contenders

1. Pinecone

Fully-managed vector database optimized for production

Architecture:

Cloud-native, serverless
Proprietary indexing algorithm
Multi-region replication
Built on PostgreSQL for metadata

Pros:

✅ Zero operational overhead
✅ Best-in-class performance
✅ Excellent developer experience
✅ Auto-scaling
✅ Built-in metadata filtering

Cons:

❌ Expensive at scale
❌ Vendor lock-in
❌ No self-hosted option
❌ Limited visibility into internals

Pricing:

Tier	Vectors	Cost
Starter	100K	$70/month
Production	1M	$280/month
Production	5M	$1,400/month
Production	10M	$2,800/month

Additional costs: $0.04 per 1M queries

Performance:

Query latency: 10-50ms (p95)
Index build time: ~5 min for 1M vectors
Recall: 95-98% at top-10

Best For:

Startups wanting fast time-to-market
Teams without DBA expertise
Applications requiring 99.99% uptime

Implementation:

import pinecone
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings

# Initialize Pinecone
pinecone.init(
    api_key="your-api-key",
    environment="us-west1-gcp"
)

# Create index
pinecone.create_index(
    "production-rag",
    dimension=1536,  # OpenAI embeddings
    metric="cosine",
    pods=1,
    replicas=1,
    pod_type="p1.x1"
)

# Upsert documents
vectorstore = Pinecone.from_documents(
    documents=documents,
    embedding=OpenAIEmbeddings(),
    index_name="production-rag"
)

# Query with metadata filter
results = vectorstore.similarity_search_with_score(
    query="What is the refund policy?",
    k=5,
    filter={"category": "policies"}
)

2. Weaviate

Open-source vector database with GraphQL API

Architecture:

Go-based storage engine
Plugin architecture for vectorizers
Modular indexing (HNSW, flat)
GraphQL + REST APIs

Pros:

✅ Open-source (Apache 2.0)
✅ Self-hosted or managed cloud
✅ Built-in vectorization (OpenAI, Cohere, local models)
✅ GraphQL for complex queries
✅ Multi-modal search (text, image, audio)

Cons:

❌ Operational overhead if self-hosted
❌ Steeper learning curve
❌ Less mature than Pinecone
❌ Community support still growing

Pricing (Weaviate Cloud):

Tier	Vectors	RAM	Cost
Sandbox	100K	1GB	Free
Standard	1M	10GB	$50/month
Performance	5M	50GB	$250/month
Business	10M	100GB	$500/month

Self-hosted: Your infrastructure costs

Performance:

Query latency: 20-80ms (p95)
Index build time: ~8 min for 1M vectors
Recall: 92-96% at top-10

Best For:

Cost-sensitive teams
Applications needing customization
Multi-modal AI use cases
Organizations with DevOps resources

Implementation:

import weaviate
from weaviate.embedded import EmbeddedOptions
from langchain.vectorstores import Weaviate
from langchain.embeddings import OpenAIEmbeddings

# Connect to Weaviate
client = weaviate.Client(
    embedded_options=EmbeddedOptions(),
    additional_headers={
        "X-OpenAI-Api-Key": "your-openai-key"
    }
)

# Create schema
client.schema.create_class({
    "class": "Document",
    "vectorizer": "text2vec-openai",
    "properties": [
        {
            "name": "content",
            "dataType": ["text"]
        },
        {
            "name": "category",
            "dataType": ["string"]
        },
        {
            "name": "last_updated",
            "dataType": ["date"]
        }
    ]
})

# Add documents
vectorstore = Weaviate.from_documents(
    documents=documents,
    embedding=OpenAIEmbeddings(),
    client=client,
    class_name="Document"
)

# Hybrid search (vector + keyword)
near_vector = {
    "vector": embedding,
    "distance": 0.7
}

bm25_query = {
    "query": "refund policy",
    "properties": ["content"]
}

results = client.query.get(
    "Document", ["content", "category"]
).with_hybrid(
    near_vector=near_vector,
    alpha=0.7,  # 0=BM25, 1=vector
    query=bm25_query
).with_limit(5).do()

3. Milvus

Open-source distributed vector database

Architecture:

Microservices-based (Go)
Shared-storage architecture
Support for 10+ index types
Kubernetes-native

Pros:

✅ Most scalable (billions of vectors)
✅ Cloud-agnostic
✅ Rich index type selection
✅ Advanced features (CDC, replication)
✅ Strong open-source community

Cons:

❌ Most complex to operate
❌ Requires Kubernetes for production
❌ Steepest learning curve
❌ Managed service (Zilliz Cloud) is pricey

Pricing (Zilliz Cloud - Managed Milvus):

Tier	Vectors	Cost
Free	1M	Free
Standard	10M	$349/month
Performance	50M	$1,049/month
Enterprise	100M+	Custom

Self-hosted: Infrastructure + operational costs

Performance:

Query latency: 15-60ms (p95)
Index build time: ~3 min for 1M vectors
Recall: 94-98% at top-10

Best For:

Large-scale applications (100M+ vectors)
Enterprises with Kubernetes expertise
Cost optimization at massive scale
Complex search requirements

Implementation:

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
from langchain.vectorstores import Milvus
from langchain.embeddings import OpenAIEmbeddings

# Connect to Milvus
connections.connect(
    alias="default",
    host="localhost",
    port="19530"
)

# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
    FieldSchema(name="content", dtype=DataType.VARCHAR, max_length=65535),
    FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=256)
]

schema = CollectionSchema(
    fields=fields,
    description="Document collection",
    enable_dynamic_field=True
)

collection = Collection(name="documents", schema=schema)

# Create index
index_params = {
    "index_type": "HNSW",  # Options: IVF_FLAT, IVF_PQ, HNSW, ANNOY
    "metric_type": "COSINE",
    "params": {
        "M": 16,
        "efConstruction": 256
    }
}

collection.create_index(
    field_name="embedding",
    index_params=index_params
)

# Insert and search
vectorstore = Milvus.from_documents(
    documents=documents,
    embedding=OpenAIEmbeddings(),
    collection_name="documents",
    index_params=index_params
)

# Load collection into memory
collection.load()

# Search with expr (expression) for filtering
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=5,
    expr="category == 'policies'"  # Metadata filter
)

Performance Benchmarks

Dataset: 1M OpenAI text-embedding-3-small vectors (1536 dims)

Metric	Pinecone	Weaviate	Milvus
Index Build Time	5.2 min	8.1 min	3.4 min
Query Latency (p50)	12ms	25ms	18ms
Query Latency (p95)	38ms	72ms	54ms
Query Latency (p99)	85ms	150ms	120ms
Throughput (QPS)	2,500	1,200	1,800
Recall@10	97.2%	94.1%	96.5%
Storage (compressed)	8.2 GB	9.5 GB	7.8 GB
Memory (RAM)	16 GB	12 GB	14 GB

Dataset: 10M vectors (scaled)

Metric	Pinecone	Weaviate	Milvus
Query Latency (p95)	65ms	180ms	95ms
Throughput (QPS)	8,500	3,200	6,800
Monthly Cost	$5,600	$500 (self)	$2,400 (cluster)

Note: Milvus self-hosted assumes 3-node K8s cluster on AWS (m5.2xlarge)

Index Types Explained

HNSW (Hierarchical Navigable Small World):

Best balance of speed and accuracy
Default for most use cases
Memory-intensive
Used by: Weaviate, Milvus, Pinecone (variant)

IVF (Inverted File):

Faster build, lower memory
Good for dynamic datasets
Slightly lower recall
Used by: Milvus

FLAT (Brute Force):

100% recall, slow at scale
Good for small datasets (<100K)
Used by: All (as baseline)

Real-World Recommendations

Use Pinecone If:

Budget is not the primary constraint
Want fastest time-to-production
Don’t want to manage infrastructure
Need 99.99% uptime SLA
Team lacks DevOps expertise

Use Weaviate If:

Need multi-modal search (text + images)
Want GraphQL API
Prefer open-source but want managed option
Custom vectorization requirements
Budget-conscious but need some support

Use Milvus If:

Scaling to 100M+ vectors
Have Kubernetes expertise
Need maximum control and customization
Building cloud-agnostic architecture
Optimizing for cost at scale

Migration Strategy

From Pinecone:

# Export Pinecone data
import pinecone
index = pinecone.Index("production-rag")
vectors = index.fetch(ids=list_of_ids)

# Import to Milvus
from pymilvus import Collection
collection = Collection("documents")
collection.insert(vectors)

From Weaviate:

# Export Weaviate data
client.data_object.get(
    class_name="Document",
    with_vector=True
)

# Import to Pinecone
pinecone.Index("production-rag").upsert(
    vectors=vectors,
    namespace="ns1"
)

Cost Optimization Tips

1. Use smaller embedding models:

OpenAI text-embedding-3-small (512 dims) vs ada-002 (1536 dims)
67% storage reduction, minimal accuracy loss

2. Reduce dimensionality with PCA:

from sklearn.decomposition import PCA

pca = PCA(n_components=256)
reduced_embeddings = pca.fit_transform(embeddings)
# 50% smaller, 2-3% accuracy loss

3. Batch queries:

Process 100 queries at once
80% cost reduction vs. individual queries

4. Hybrid search:

Filter by metadata before vector search
Reduces search space by 90%

5. Caching:

Cache frequent queries in Redis
70% cache hit rate for production apps

Production Checklist

✅ Benchmark with your actual dataset ✅ Test recall at your target k (top-5, top-10) ✅ Measure p95 and p99 latency, not just average ✅ Load test with production query patterns ✅ Plan for data growth (3x, 10x projections) ✅ Test failover and recovery procedures ✅ Calculate total cost of ownership (not just monthly) ✅ Consider vendor lock-in risks ✅ Evaluate team’s operational capabilities ✅ Test with realistic metadata filters

Conclusion

Winner by Use Case:

Startup MVP: Pinecone (fastest to implement)
Cost-optimized growth: Weaviate (best value)
Enterprise scale: Milvus (unlimited scalability)

My Recommendation: Start with Pinecone for rapid prototyping. If you hit >$500/month or need custom features, migrate to self-hosted Weaviate or Milvus. The migration cost is far lower than months of overpriced hosting.

Future Outlook: PostgreSQL and MongoDB are adding vector search capabilities. For simple use cases (<1M vectors), they may eliminate the need for specialized vector databases. But for production RAG systems at scale, dedicated vector databases remain the optimal choice.

Vector databases are commoditizing rapidly. Choose based on your team’s capabilities and scale requirements, not marketing hype. All three options discussed here power production systems successfully today.

Vector Database Showdown - Pinecone vs. Weaviate vs. Milvus

What Are Vector Databases?

Top Contenders

1. Pinecone

2. Weaviate

3. Milvus

Performance Benchmarks

Index Types Explained

Real-World Recommendations

Migration Strategy

Cost Optimization Tips

Production Checklist

Conclusion

We have a newsletter