Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
dLLM is an open-source unified library for training, inference, and evaluation of diffusion language models (dLMs), offering a fundamentally different approach to text generation compared to conventional autoregressive LLMs. Instead of producing tokens one by one from left to right, diffusion language models generate all tokens simultaneously through iterative denoising, similar to how image diffusion models like Stable Diffusion work but applied to discrete text. ## Why Diffusion Language Models Matter Autoregressive language models like GPT, Claude, and Llama generate text sequentially: each token depends on all previous tokens, creating a computational bottleneck that scales linearly with sequence length. Diffusion language models break this constraint by starting with a noisy or masked sequence and progressively refining all positions in parallel across multiple denoising steps. This enables constant-time generation regardless of output length once the number of diffusion steps is fixed, opening the door to dramatically faster inference for long-form content. The tradeoff is that diffusion models typically require more denoising steps to match autoregressive quality, and the field is still maturing. But recent breakthroughs like LLaDA, Dream, and Fast-dLLM have closed much of the quality gap while keeping step counts low. ## Core Architecture dLLM provides a modular framework built on top of HuggingFace Transformers, structuring the diffusion pipeline into three decoupled components: schedulers (noise scheduling), samplers (denoising strategies), and trainers (loss computation and optimization). This separation allows researchers to mix and match components freely. For example, a masked diffusion scheduler can be paired with a confidence-threshold sampler and trained with a standard cross-entropy loss. The library supports multiple diffusion paradigms under one roof: - **Masked Diffusion (MDLM)**: Progressively unmasks tokens from a fully masked sequence, as used in LLaDA and LLaDA-MoE. - **Block Diffusion (BD3LM)**: Denoises fixed-size blocks of tokens, balancing parallelism with local coherence. - **Edit Flows**: Models text as a sequence of insertion, deletion, and substitution operations rather than token-level denoising. - **A2D (AR-to-Diffusion)**: Converts any pretrained autoregressive model into a diffusion variant without retraining from scratch. ## Training Infrastructure dLLM builds on the HuggingFace Trainer API, inheriting support for LoRA adapters, 4-bit quantization (QLoRA), DeepSpeed ZeRO stages 1-3, FSDP, and streaming datasets. This means researchers can fine-tune diffusion models with the same tooling they use for conventional LLMs. A single training script handles pretraining, supervised fine-tuning, and adapter training with minimal configuration changes. ## Fast-dLLM Inference Acceleration Released in February 2026, Fast-dLLM introduces two key optimizations for inference speed: 1. **KV Cache for Diffusion**: Caches attention computations across denoising steps, recomputing only for positions that changed since the previous step. This reduces per-step cost by 40-60% depending on the model and sequence length. 2. **Confidence-Threshold Decoding**: Dynamically skips denoising steps for positions where the model is already confident, reducing the effective number of steps without quality degradation. Together, these optimizations bring diffusion model inference speed within 2-3x of autoregressive generation for most benchmarks, down from 5-10x slower without optimization. ## Supported Models dLLM ships with implementations and training recipes for several open-weight diffusion models: | Model | Parameters | Type | Notable Feature | |-------|-----------|------|----------------| | LLaDA-8B | 8B | Masked diffusion | Best general-purpose dLM | | LLaDA-MoE | 8x1B | Mixture of experts | Efficient sparse activation | | Dream | 0.5B-7B | Masked diffusion | Lightweight research models | | BERT-Chat | 110M-340M | Encoder-based | Conversational diffusion | | Tiny-A2D | 0.5B-0.6B | AR-to-diffusion | SOTA small diffusion models | ## Evaluation dLLM integrates with lm-evaluation-harness for standardized benchmarking, supporting all standard NLP benchmarks (MMLU, HellaSwag, ARC, WinoGrande, and more). The unified evaluation pipeline enables direct comparison between diffusion and autoregressive models on identical tasks. ## Community and Adoption With 2,100 GitHub stars and 200 forks as of March 2026, dLLM has become the reference implementation for diffusion language modeling research. The project is backed by an academic paper (arXiv 2602.22661) authored by Zhanhui Zhou, Lingjie Chen, Hanghang Tong, and Dawn Song. It holds Apache 2.0 licensing, enabling both research and commercial applications.

Shubhamsaboo
Collection of 100+ production-ready LLM apps with AI agents, RAG, voice agents, and MCP using OpenAI, Anthropic, Gemini, and open-source models
infiniflow
Leading open-source RAG engine with deep document understanding, grounded citations, and agent capabilities, with 73K+ GitHub stars.