Trending

MLC-LLM

mlc-aiApache-2.0

Inference22.3K Stars2.0K Forks233 views

MLC-LLM is a universal LLM deployment engine powered by ML compilation via Apache TVM. It compiles models once and runs them natively across NVIDIA, AMD, Apple, and Intel GPUs as well as mobile platforms including iOS and Android, with WebGPU support for browser-based inference. The unified MLCEngine provides an OpenAI-compatible REST API, Python, JavaScript, and mobile bindings from the same compiled artifact, enabling developers to deploy quantized LLMs from cloud to edge without platform-specific rewrites.

Key Features

ML compilation via Apache TVM for cross-platform model deployment
Native support for NVIDIA, AMD, Apple Metal, Intel GPUs, and WebGPU
iOS and Android on-device inference with native mobile bindings
OpenAI-compatible REST API, Python, and JavaScript SDKs from one engine
4-bit and 8-bit quantization for memory-efficient edge deployment

Open Source

MLC-LLM

Key Features

Tags

Related Projects

Ollama

llama.cpp

vLLM

Unsloth