Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
RAG Techniques is one of the most comprehensive open-source collections of Retrieval-Augmented Generation techniques on GitHub, with over 25,000 stars and a community of 50,000+ AI enthusiasts. Created by NirDiamant, this repository provides 34+ hands-on implementations covering everything from basic RAG to advanced architectures like Graph RAG, Self-RAG, and Corrective RAG. ## Why RAG Techniques Matters Retrieval-Augmented Generation has become a foundational pattern for building production-grade LLM applications. Rather than relying solely on a model's parametric knowledge, RAG systems retrieve relevant external documents before generating responses, dramatically improving accuracy and reducing hallucinations. RAG Techniques provides a structured, hands-on curriculum that takes developers from beginner-level implementations all the way to cutting-edge research architectures. The repository stands out because every technique comes with runnable Jupyter notebooks and Python scripts, using popular frameworks like LangChain and LlamaIndex. This means developers can experiment with each approach immediately rather than reading theoretical descriptions. ## Foundational RAG Methods The collection starts with fundamental techniques that every RAG practitioner needs. Basic RAG demonstrates the core retrieve-then-generate pattern. CSV-based RAG shows how to work with structured tabular data. Proposition chunking and chunk size optimization address one of the most impactful decisions in any RAG pipeline: how to split documents into retrievable units. These foundational notebooks include detailed explanations of why each approach works and when to apply it. ## Query Enhancement Techniques Query enhancement is often the highest-leverage improvement for RAG systems. The repository covers query transformations that rewrite user questions for better retrieval, HyDE (Hypothetical Document Embedding) that generates a hypothetical answer to use as a retrieval query, and the newer HyPE variant. These techniques can significantly improve retrieval recall without changing the underlying document index. ## Context Enrichment and Advanced Retrieval For production systems that need higher accuracy, the repository provides semantic chunking that splits documents based on meaning rather than fixed token counts, contextual compression that reduces retrieved context to only the relevant portions, and document augmentation techniques. Advanced retrieval methods include fusion retrieval combining multiple retrieval strategies, reranking with cross-encoder models, hierarchical indices for multi-scale document access, and ensemble retrieval that blends results from different retrieval approaches. ## Cutting-Edge Architectures The most advanced section covers architectures from recent research. Graph RAG and Microsoft's GraphRAG build knowledge graphs from documents for structured retrieval. RAPTOR creates hierarchical document summaries at multiple abstraction levels. Self-RAG teaches the model to decide when retrieval is needed and evaluate its own outputs. Corrective RAG adds a verification step that checks whether retrieved documents actually answer the question before generating. Agentic RAG combines retrieval with autonomous agent capabilities for complex multi-step reasoning tasks. ## Multi-Modal and Evaluation The repository also covers multi-modal RAG with image captioning, enabling retrieval over documents containing both text and images. For measuring quality, it includes evaluation frameworks using DeepEval and GroUSE, providing standardized metrics for retrieval accuracy, answer faithfulness, and response relevance. ## Community and Ecosystem With 433 commits, 3,000+ forks, and an active Discord community, RAG Techniques has become a de facto reference resource for RAG practitioners. The creator maintains companion repositories on AI agents and prompt engineering, forming a comprehensive learning ecosystem for applied AI development.

Shubhamsaboo
Collection of 100+ production-ready LLM apps with AI agents, RAG, voice agents, and MCP using OpenAI, Anthropic, Gemini, and open-source models
infiniflow
Leading open-source RAG engine with deep document understanding, grounded citations, and agent capabilities, with 73K+ GitHub stars.