List of RAG (Retrieval-Augmented Generation) Patterns. This is still an evolving field, so the list is updated as of 04-Jan-2026.
- Foundational Patterns
- Standard/Naive/Single-hop RAG – The “Hello World” of RAG. You embed a user query, retrieve the top-k chunks from a vector database, and feed them to the LLM. It is fast but suffers from low precision if the document chunks are messy or the query is vague.
- Indexing & Context Optimization
- Parent-Child (Small-to-Big) RAG – Split documents into small “child” chunks for high-precision search. When a child is matched, you retrieve its larger “parent” chunk (or the whole document) to feed the LLM. This ensures the LLM gets enough context to answer correctly.
- Sentence Window Retrieval – Similar to Parent-Child. You index individual sentences for search. When a sentence is found, the system retrieves a “window” of 3–5 sentences before and after it to provide context.
- KG‑augmented RAG – Augments the vector search by querying a Knowledge Graph (KG) to inject structured relationships (entities and connections) that vector similarity might miss.
- GraphRAG – An evolution of KG-augmented RAG. It performs “community detection” on the graph to summarize entire clusters of information. It excels at answering global questions like “What are the major themes in these documents?” which standard RAG fails to address.
- Query & Retrieval Refinement
- Hybrid RAG – Combines Vector Search (semantic meaning) with Keyword Search/BM25 (exact term matching). This is crucial for domains with specific jargon or product IDs where semantic search often fails.
- Rerank‑enhanced RAG – Retrieves a large number of documents (e.g., 50) first, then uses a specialized “Cross-Encoder” model (a Reranker) to meticulously score and re-order them, passing only the top 5 highly relevant ones to the LLM.
- RAG Fusion – Generates multiple variations of the user’s query (e.g., rewriting the question 3 times). It retrieves documents for all variations and uses an algorithm called Reciprocal Rank Fusion (RRF) to consolidate the results.
- HyDE (Hypothetical Document Embeddings) – The LLM first hallucinates a hypothetical answer to the user’s question. This hypothetical answer is then used for the vector search (instead of the raw question), often leading to better semantic matches.
- Logic, Routing & Verification
- Adaptive RAG – A router analyzes the complexity of the query before starting.
- Corrective RAG (CRAG) – After retrieval, a lightweight evaluator checks if the retrieved documents are actually relevant. If they are irrelevant/ambiguous, it triggers a fallback action (like a web search) instead of hallucinating an answer.
- Self-RAG – The LLM is fine-tuned to generate “reflection tokens” (tags) during generation. It critiques its own retrieval and its own answer in real-time, deciding if it needs to retrieve more data or if the answer is fully supported.
- Autonomous & Multi-Step Workflows
- Multi‑hop RAG – Used when the answer requires combining information from multiple distinct documents (e.g., “Compare the revenue of Company A in 2021 with Company B in 2022”). The system performs one retrieval, analyzes it, and uses that to generate a second retrieval query.
- Agentic RAG – The LLM acts as an autonomous agent with access to “tools” (one of which is a retrieval engine). The agent plans a sequence of actions, can use RAG multiple times, use a calculator, or search the web to solve the user’s problem.
- Specialized Modalities
- SQL‑augmented RAG – Converts natural language into SQL queries to interact with structured relational databases, often combining the results with unstructured text retrieval.
- Multi‑modal RAG – Handles inputs and outputs involving images, audio, or video alongside text (e.g., searching a video archive using a text description).

