AI & RAG
Multi-provider AI and Retrieval-Augmented Generation setup.
AI & RAG
SaaS Starter ships a 7-provider AI factory and a complete pgvector RAG pipeline — no extra packages needed.
AI Providers
Set AI_PROVIDER in your .env to switch providers at runtime:
| Value | Provider | Models |
|---|---|---|---|
| openai | OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo |
| anthropic | Anthropic | claude-3-5-sonnet, claude-3-haiku |
| gemini | Google Gemini | gemini-2.0-flash, gemini-1.5-pro |
| groq | Groq | llama-3.3-70b, mixtral-8x7b |
| cerebras | Cerebras | llama-3.1-8b |
| mistral | Mistral | mistral-large-latest |
| together | Together | meta-llama/Llama-3.3-70B-Instruct-Turbo |
# .env
AI_PROVIDER=openai
AI_API_KEY=sk-...
AI_MODEL=gpt-4o-mini # optional override
AI_MAX_TOKENS=2048
Chat API
POST /api/ai/chat
Authorization: Bearer <token>
{
"messages": [{ "role": "user", "content": "Hello!" }],
"systemPrompt": "You are a helpful assistant.",
"useRag": true
}
POST /api/ai/stream
Authorization: Bearer <token>
# Returns SSE events:
# event: delta data: {"delta":"Hi"}
# event: done data: {"totalTokens":42}
# event: error data: {"error":"..."}
Both endpoints accept "useRag": true (default) to automatically inject relevant context from your organization's knowledge base.
RAG Pipeline
Architecture
User query
↓
embedText() → pgvector similarity search
↓
Top-K chunks retrieved
↓
buildRAGContext() → injectRAGContext()
↓
Enriched system prompt → AI provider
Setup
Ensure pgvector is installed:
CREATE EXTENSION IF NOT EXISTS vector;
Then run migrations:
pnpm --filter "@app/db" run migrate
Ingest documents
POST /api/rag/ingest
Authorization: Bearer <token>
{
"sourceName": "Product FAQ",
"text": "...(up to 200,000 characters)...",
"sourceId": "faq-v1", // optional idempotency key
"chunkSize": 1000, // optional, default 1000
"overlap": 100 // optional, default 100
}
Semantic search
POST /api/rag/search
Authorization: Bearer <token>
{
"query": "How do I reset my password?",
"topK": 5,
"threshold": 0.7
}
List / delete sources
GET /api/rag/sources
DELETE /api/rag/sources/:sourceId
Rate limits
| Endpoint | Limit |
|---|---|
| POST /api/rag/ingest | 20 requests / hour |
| POST /api/rag/search | 60 requests / minute |
| POST /api/ai/chat | 30 requests / minute (global) |
RAG embeddings are generated using text-embedding-3-small (OpenAI). Even if you use
a different AI_PROVIDER for chat, AI_API_KEY must be a valid OpenAI key for
embedding-based RAG to work.
Using RAG in your own code
import { buildContextForQuery } from '@app/api/services/rag'
import { injectRAGContext } from '@app/ai'
const context = await buildContextForQuery(orgId, userMessage, 5)
const enrichedPrompt = injectRAGContext(mySystemPrompt, context)