In our previous article, “Retrieval-Augmented Generation (RAG): A Product Manager’s Guide”, we explored the fundamentals of RAG and how it empowers AI systems to combine external knowledge with language models. In this article, we take a step further and dive into 25 essential RAG techniques that every AI practitioner, product manager, or developer should know. From simple retrieval setups to advanced workflows like ReAct, HyDE, and RETRO, this article will help you understand how to design, optimize, and scale RAG systems for accuracy, efficiency, and personalization.
1. Naive / Simple RAG
Workflow:
User submits a query.
Retriever fetches a few relevant text chunks from the knowledge base.
LLM consumes these chunks and generates an answer.
Why it matters: It’s the simplest RAG setup, ideal for straightforward FAQs.
Example:
Query: “What is your refund policy?”
Retriever fetches the “Refund Policy” section.
LLM summarizes it.
Limitation: Fails for vague/multi-step queries or when context is spread across multiple documents.
2. Basic Unstructured RAG
Workflow:
User query is sent.
Retriever searches raw unstructured data (PDFs, plain text).
LLM generates an answer directly from the retrieved chunks.
Why it matters: No preprocessing needed; fast to implement.
Example: Scanning annual reports for “total revenue.”
Limitation: Low-quality responses if data is poorly structured.
3. Contextual RAG
Workflow:
Capture query + user context (role, past interactions).
Retrieve contextually relevant documents.
LLM tailors answer based on both retrieved documents and user context.
Example: Premium user vs new user receiving different support answers.
Benefit: Personalizes answers for better relevance.
4. Hybrid RAG
Workflow:
Query processed by both:
Keyword search (exact matches)
Vector search (semantic similarity)
Combine results and pass to LLM.
Example: Searching for “machine learning” also finds “ML algorithms.”
Benefit: Balances precision (keywords) and recall (semantic understanding).
5. Fusion RAG
Workflow:
Generate multiple query variations.
Retrieve documents for each variation.
Fuse results using ranking (e.g., Reciprocal Rank Fusion).
LLM generates final answer from fused context.
Example: Query: “AI safety” → variations: “artificial intelligence safety,” “AI risk.”
Benefit: Improves coverage and reduces missing critical info.
6. HyDE RAG (Hypothetical Document Embeddings)
Workflow:
LLM first generates a hypothetical answer from the query.
Retriever finds documents semantically close to this answer.
LLM synthesizes final answer using retrieved documents.
Example: Query: “Best way to boost model accuracy?” → LLM hypothesizes steps → documents retrieved support that.
Benefit: Helps when queries are vague or too short for normal retrieval.
7. Parent Document Retriever
Workflow:
Chunk documents and index them.
Retriever fetches relevant chunks.
Instead of sending chunks, send the full parent document to LLM.
Example: Retrieve a research paper chunk → LLM reads entire paper for better context.
Benefit: Reduces errors from missing context in fragmented chunks.
8. Rewrite-Retrieve RAG
Workflow:
LLM rewrites the user query to match KB terminology.
Retriever uses rewritten query for search.
LLM generates answer from retrieved documents.
Example: “How to train a bot?” → rewritten: “Procedures for training a conversational AI model.”
Benefit: Improves retrieval for informal or ambiguous queries.
9. Conversational RAG
Workflow:
Maintain conversation history.
Retrieve documents relevant to both query and chat history.
LLM generates answer consistent with prior conversation.
Example: Follow-up question about a product remembers the initial context.
10. Memo RAG
Workflow:
Store persistent memory about the user.
Retrieve relevant memory along with documents for each query.
LLM incorporates both knowledge base + user memory.
Example: Remember a user’s preferred language or favorite topic.
11. Context Cache RAG
Workflow:
Cache frequently retrieved documents/contexts.
Next time the same query appears, retrieve from cache.
LLM uses cached context to answer.
Example: FAQs like “How to reset password?”
Benefit: Reduces latency and computation costs.
12. Iterative RAG
Workflow:
Retrieve documents.
LLM generates an initial answer.
Identify missing info → retrieve more → refine answer.
Repeat until satisfactory.
Example: “How to optimize a neural network?” → basic techniques → advanced strategies → final best practices.
13. Corrective RAG
Workflow:
LLM outputs an answer.
If low confidence or factual error detected → trigger additional retrieval.
Refine answer with new context.
Example: Bot says “Paris is capital of Germany?” → detects error → retrieves correct info.
Benefit: Reduces hallucinations.
14. Self-RAG
Workflow:
LLM evaluates its own answer quality.
Decides whether more retrieval is needed.
Refines answer autonomously.
Example: LLM unsure about X → triggers further retrieval before answering.
Benefit: Makes system more autonomous.
15. Adaptive RAG
Workflow:
Analyze query complexity.
Adjust number of documents retrieved dynamically.
LLM generates answer using retrieved context.
Example: Simple query → 2–3 docs, complex query → 10+ docs.
Benefit: Optimizes accuracy vs cost.
16. ReAct RAG
Workflow:
LLM alternates between reasoning and retrieving/acting.
Each step can trigger new retrieval or intermediate reasoning.
Produces stepwise, multi-step solutions.
Example: “Plan a marketing campaign” → reason → retrieve trends → refine plan iteratively.
17. Replug RAG
Workflow:
Retrieval module is pluggable.
LLM remains unchanged; only retrieval backend can change.
Modular architecture enables experimentation.
Example: Swap Elasticsearch for Pinecone without retraining LLM.
18. Refeed RAG
Workflow:
LLM generates preliminary answer.
Feed generated output back to retriever as a hint query.
Retriever finds additional relevant documents.
LLM refines answer using new context.
Example: Draft answer identifies missing steps → retrieval fills gaps.
19. REALM RAG
Workflow:
Integrates retrieval directly into the LLM during generation.
Queries knowledge base dynamically as LLM generates tokens.
Example: LLM answers QA questions while pulling relevant knowledge from KB on-the-fly.
Benefit: Efficient open-domain QA with higher factual accuracy.
20. RETRO RAG
Workflow:
LLM accesses large-scale external data during generation.
Similar to REALM but targets very large corpora in real-time.
Example: LLM retrieves scientific articles while answering biology questions.
Benefit: Improves factual accuracy for domains with huge datasets.
21. RAPTOR RAG
Workflow:
Use hierarchical/tree-structured indexing.
Retrieve top-level category → sub-category → final documents.
LLM generates answer using refined subset.
Example: Enterprise knowledge base with millions of documents.
Benefit: Scales efficiently to massive datasets.
22. Attention-Based RAG
Workflow:
LLM applies attention mechanisms to prioritize parts of retrieved content.
Focuses on most relevant passages while generating answer.
Benefit: Improves answer quality by filtering noise.
23. Explainable (XAI) RAG
Workflow:
Track which documents and chunks were retrieved.
LLM generates answer and cites sources.
System exposes reasoning path.
Example: “Answer derived from documents A, B, C.”
Benefit: Builds trust, useful for compliance.
24. Speculative RAG
Workflow:
Generate preliminary answers with limited context.
Continue retrieval in the background.
Refine answer as more context becomes available.
Example: Early draft answer refined as more documents arrive.
Benefit: Reduces latency while maintaining accuracy.
25. Cost-Constrained / ECO RAG
Workflow:
Monitor retrieval cost, tokens, latency, energy.
Optimize number/size of documents retrieved.
LLM generates answer efficiently.
Example: Skip low-relevance docs to save compute in production.
Workflow Patterns Across Techniques
Focus Area | Representative Techniques | Workflow Pattern |
|---|---|---|
Accuracy & relevance | Contextual, Fusion, Corrective, RETRO | Multi-step retrieval, query refinement, fact-checking |
Efficiency & cost | Adaptive, ECO, Context Cache | Dynamic doc selection, caching, token optimization |
Personalization | Memo, Contextual, Conversational | Store/retrieve user preferences, conversation history |
Scalability | RAPTOR, REALM | Hierarchical or integrated retrieval for huge datasets |
Trust & transparency | XAI, Explainable | Source tracking, reasoning explanation |
RAG is not a single technique, but a design space. You can mix and match based on:
Query complexity
Dataset size
Budget and latency
Personalization needs
Factual reliability requirements