Kurai - AI & Backend Development Agency

Position Description:

We’re seeking an exceptional ML Engineer specializing in Large Language Models and Retrieval-Augmented Generation to join our AI Engineering team. You’ll architect and productionize RAG systems, fine-tune foundation models, and build intelligent AI features that directly impact client products. This is a technical, hands-on role where you’ll work with cutting-edge models (GPT-5, Claude 3, Llama 2) and deploy systems that process millions of queries daily.

Responsibilities:

Design and implement production RAG systems using vector databases (Pinecone, Weaviate) and LLM APIs
Fine-tune open-source models (Llama 2, Mistral) using LoRA/QLoRA for domain-specific applications
Optimize LLM performance through prompt engineering, context management, and evaluation frameworks
Build evaluation pipelines to measure retrieval quality, answer accuracy, and model hallucination rates
Collaborate with backend engineers to integrate AI features into FastAPI/Node.js services
Research and implement state-of-the-art techniques in document chunking, embedding strategies, and retrieval optimization

Qualifications:

MS or PhD in Computer Science, ML, AI, or related field (or equivalent experience)
3+ years of hands-on experience deploying ML models to production
Strong Python skills; experience with PyTorch, TensorFlow, or JAX
Deep understanding of transformer architectures and LLM limitations
Production experience with vector databases and embedding models
Familiarity with LangChain, LlamaIndex, or similar RAG frameworks
Experience with MLOps tools: MLflow, Weights & Biases, DVC, or Kubeflow
Published research or open-source contributions in NLP/LLMs is a plus

ML Engineer - LLM & RAG

Location

Type

Department

Salary

Apply