Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

LiteRT-LM - Open Source | Evermx | Evermx

Back to Open Source

LiteRT-LM

Apache-2.0

View on GitHub

Inference4.0K Stars393 Forks930 views

## Overview LiteRT-LM is Google AI Edge's official open-source inference framework for deploying Large Language Models directly on edge devices. Announced and open-sourced in April 2026, LiteRT-LM brings production-grade LLM inference to Android, iOS, Web browsers, desktop computers, and IoT devices like Raspberry Pi — completely eliminating the need for cloud round-trips. The framework already powers on-device AI in Google's own products including Chrome, Chromebook Plus, and Pixel Watch, demonstrating battle-tested production readiness at scale. With 4,028 stars and active development (updated daily), LiteRT-LM is rapidly establishing itself as the definitive standard for on-device LLM inference. ## Key Features - **Cross-Platform Deployment**: Single framework targets Android (Kotlin/Java), iOS (Swift), Web (WASM), Desktop (Python/C++), and embedded Linux including Raspberry Pi - **Hardware Acceleration**: Automatically leverages GPU and NPU processors on each target platform for optimal throughput and energy efficiency - **Multimodal Support**: Handles text, vision, and audio inputs, enabling on-device multimodal AI applications - **Function Calling for Agents**: Built-in support for structured output and function calling enables fully on-device agentic workflows without cloud dependency - **Broad Model Compatibility**: Works with Gemma, Llama, Phi-4, Qwen, and architecturally similar open-weight models - **Constrained Decoding**: Advanced output control ensures reliability in production agentic scenarios ## Use Cases LiteRT-LM is ideal for privacy-sensitive applications where data must not leave the device, offline-capable mobile and desktop AI assistants, low-latency interactive experiences that cannot tolerate cloud round-trip times, embedded systems and IoT deployments with limited or no connectivity, and developers building consumer apps who want to avoid per-query API costs. ## Technical Details The core is written in C++ for maximum performance, with stable language bindings for Kotlin (Android production), Python (prototyping and desktop), and direct C++ for native applications. The framework uses platform-native acceleration APIs: NNAPI and GPU on Android, Metal on iOS, and WebGPU in browsers. Model weights are quantized and optimized at load time. ## Getting Started ```bash # Install CLI via uv (no coding required) uv tool install litert-lm # Run a model immediately litert-lm run --model gemma-3-1b-it # Python API pip install litert-lm from litert_lm import LlmInference model = LlmInference.create_from_model("gemma-3-1b-it-cpu.task") print(model.generate_response("Hello, world!")) ```

Related Projects

TrendingInference

GitHub

165.0K15.0K

Ollama

ollama

MIT303

Open Source

LiteRT-LM

Tags

Related Projects

Ollama

llama.cpp

vLLM

Unsloth