Trending

WebLLM

mlc-aiApache-2.0

Inference17.7K Stars1.2K Forks131 views

MLC-AI's high-performance in-browser LLM inference engine that runs large language models entirely in the web browser using WebGPU hardware acceleration, with no server required. WebLLM is fully compatible with the OpenAI API, supporting streaming, JSON mode, and function calling for local privacy-preserving inference. Models are cached in the browser after first download, enabling offline operation. Supports Llama 3.2, Phi-4, Gemma 2, Mistral, Qwen 2.5, and DeepSeek-R1 distilled variants.

Key Features

Full LLM inference in the browser via WebGPU — zero server infrastructure required
OpenAI API compatible with streaming, JSON mode, and function calling
Offline capability after first model download via browser Cache API or IndexedDB
Web Worker and Service Worker integration for non-blocking UI thread operation
Supports Llama 3.2, Phi-4, Gemma 2, Mistral, Qwen 2.5, DeepSeek-R1 distilled models
Progressive model loading with download progress callbacks for polished UX
Chrome Extension support for browser-integrated AI features

Open Source

WebLLM

Key Features

Tags

Related Projects

Ollama

llama.cpp

Unsloth

SGLang