RAG (Retrieval-Augmented Generation)
An LLM is given relevant retrieved documents as context before generating a response — grounding outputs in your specific data, not just the model's training.
Quick answer
What is RAG (Retrieval-Augmented Generation)?
An LLM is given relevant retrieved documents as context before generating a response — grounding outputs in your specific data, not just the model's training.
Last updated:
RAG is the standard pattern for making LLMs useful with company-specific data. The flow: (1) user asks a question; (2) the question is embedded as a vector; (3) the system retrieves the K most similar documents from a vector database; (4) retrieved documents are stuffed into the prompt as context; (5) the LLM generates a grounded response.
For Salesforce, RAG enables answering questions like "What's the history of issues with Acme Corp?" — the system retrieves the relevant Cases, Knowledge Articles, and email threads, then the LLM synthesizes a response. Without RAG, the LLM would either fabricate or refuse to answer.
Modern RAG variations include: hybrid retrieval (combining vector and keyword search), reranking (using a second model to refine retrieval), and GraphRAG (using knowledge graphs alongside vectors). gptfy's RAG-in-Salesforce feature implements production-grade RAG with PII masking and audit trails over Salesforce data.
Related terms
Browse all terms- EmbeddingsNumeric vector representations of text that capture semantic meaning — the foundation of semantic search, RAG, and most modern NLP applications.
- Vector DatabaseA database optimized for storing and querying high-dimensional vectors (embeddings) — the storage layer that makes semantic search and RAG fast at scale.
See it in your Salesforce org
See RAG (Retrieval-Augmented Generation) running in GPTfy
Book 30 minutes with a GPTfy engineer to see how RAG (Retrieval-Augmented Generation) actually works inside a Salesforce org like yours.
Book a demo