Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

Evermx

Latest AI/LLM news and in-depth reviews.
We analyze usability, potential, and trade-offs.

info@evermx.com

LLM

Claude
Gemini
GPT
Llama
Other LLM

Official Sites

Anthropic (Claude)
Google AI (Gemini)
OpenAI (GPT)
Meta AI (Llama)
Hugging Face

About Editorial Policy Contact Privacy Policy Terms of Service

Reviews Tools Open Source Live Official Profile

DFlash - Open Source | Evermx | Evermx

Back to Open Source

Trending

DFlash

z-labMIT

View on GitHub

Inference1.6K Stars180 Forks138 views

Lightweight block diffusion model for speculative decoding that enables parallel token drafting for faster LLM generation. Compatible with vLLM, SGLang, Transformers, and MLX backends.

Key Features

Block diffusion draft model generates entire token blocks in parallel rather than sequentially
Drop-in integration with vLLM via single command-line flag for speculative decoding
SGLang integration combines DFlash drafting with RadixAttention prefix caching
Hugging Face Transformers support via spec_generate() method for research workflows
MLX backend enables Apple Silicon inference acceleration on M-series Macs
Supports Qwen3/3.5/3.6, LLaMA 3.1, Kimi-K2.5, and GPT-OSS model families
Standardized evaluation harness covering GSM8K, MATH-500, HumanEval, MBPP, MT-Bench
MIT license with production-ready backend integrations

Related Projects

TrendingInference

GitHub

165.0K15.0K

Ollama

ollama

The simplest way to run LLMs locally with 165K+ GitHub stars. One-command deployment, 100+ models, REST API, and multi-platform support.

llama.cpp

ggml-org

Pure C/C++ LLM inference engine supporting CPUs, Apple Silicon, CUDA, and Vulkan

vLLM

vLLM Project

A high-throughput, memory-efficient LLM inference and serving engine built around PagedAttention, with an OpenAI-compatible API and 200+ model support.

Apache-2.074