Skip to main content

AI & RAG

Multi-provider AI and Retrieval-Augmented Generation setup.

AI & RAG

SaaS Starter ships a 7-provider AI factory and a complete pgvector RAG pipeline — no extra packages needed.

AI Providers

Set AI_PROVIDER in your .env to switch providers at runtime:

| Value | Provider | Models | |---|---|---|---| | openai | OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo | | anthropic | Anthropic | claude-3-5-sonnet, claude-3-haiku | | gemini | Google Gemini | gemini-2.0-flash, gemini-1.5-pro | | groq | Groq | llama-3.3-70b, mixtral-8x7b | | cerebras | Cerebras | llama-3.1-8b | | mistral | Mistral | mistral-large-latest | | together | Together | meta-llama/Llama-3.3-70B-Instruct-Turbo |

# .env
AI_PROVIDER=openai
AI_API_KEY=sk-...
AI_MODEL=gpt-4o-mini   # optional override
AI_MAX_TOKENS=2048

Chat API

POST /api/ai/chat
Authorization: Bearer <token>

{
  "messages": [{ "role": "user", "content": "Hello!" }],
  "systemPrompt": "You are a helpful assistant.",
  "useRag": true
}
POST /api/ai/stream
Authorization: Bearer <token>

# Returns SSE events:
# event: delta  data: {"delta":"Hi"}
# event: done   data: {"totalTokens":42}
# event: error  data: {"error":"..."}

Both endpoints accept "useRag": true (default) to automatically inject relevant context from your organization's knowledge base.

RAG Pipeline

Architecture

User query
    ↓
embedText() → pgvector similarity search
    ↓
Top-K chunks retrieved
    ↓
buildRAGContext() → injectRAGContext()
    ↓
Enriched system prompt → AI provider

Setup

Ensure pgvector is installed:

CREATE EXTENSION IF NOT EXISTS vector;

Then run migrations:

pnpm --filter "@app/db" run migrate

Ingest documents

POST /api/rag/ingest
Authorization: Bearer <token>

{
  "sourceName": "Product FAQ",
  "text": "...(up to 200,000 characters)...",
  "sourceId": "faq-v1",        // optional idempotency key
  "chunkSize": 1000,            // optional, default 1000
  "overlap": 100                // optional, default 100
}

Semantic search

POST /api/rag/search
Authorization: Bearer <token>

{
  "query": "How do I reset my password?",
  "topK": 5,
  "threshold": 0.7
}

List / delete sources

GET    /api/rag/sources
DELETE /api/rag/sources/:sourceId

Rate limits

| Endpoint | Limit | |---|---| | POST /api/rag/ingest | 20 requests / hour | | POST /api/rag/search | 60 requests / minute | | POST /api/ai/chat | 30 requests / minute (global) |

RAG embeddings are generated using text-embedding-3-small (OpenAI). Even if you use a different AI_PROVIDER for chat, AI_API_KEY must be a valid OpenAI key for embedding-based RAG to work.

Using RAG in your own code

import { buildContextForQuery } from '@app/api/services/rag'
import { injectRAGContext } from '@app/ai'

const context = await buildContextForQuery(orgId, userMessage, 5)
const enrichedPrompt = injectRAGContext(mySystemPrompt, context)