Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

Exo - Open Source | Evermx | Evermx

Back to Open Source

TrendingFeatured

Exo

exo labsApache-2.0

View on GitHub

LLM41.6K Stars2.8K Forks1.1K views

Exo is an open-source framework that lets you run frontier AI models locally by connecting multiple everyday devices into a distributed AI cluster. Rather than requiring a single expensive GPU server, Exo pools compute resources across Apple Silicon Macs and other devices, enabling users to run models as large as DeepSeek v3.1 671B or Qwen3-235B from their own hardware. ## Why Exo Matters Running large language models locally has been constrained by individual device memory limits. A single Mac with 192GB of unified memory cannot load a 671B parameter model, but four Macs networked together can. Exo solves this distribution problem automatically, with zero manual configuration required. Devices running Exo discover each other on the network and form a cluster without any setup beyond launching the application. The project has rapidly gained traction with over 41,600 GitHub stars, reflecting strong demand for local AI inference solutions that preserve data privacy and eliminate API costs. ## Automatic Device Discovery Exo uses a peer-to-peer architecture where each device running the framework automatically detects other instances on the local network. There is no concept of a master or worker node. Any device in the cluster can initiate inference requests, and the system coordinates model distribution across all available resources transparently. ## Tensor Parallelism and Topology-Aware Distribution The framework implements tensor parallelism to shard models across devices, achieving measured speedups of 1.8x on 2 devices and 3.2x on 4 devices. The topology-aware auto-parallel system analyzes real-time device resources and network characteristics to determine the optimal distribution strategy. This means adding a device to the cluster immediately improves throughput without manual reconfiguration. ## RDMA Over Thunderbolt 5 Exo includes day-zero support for RDMA (Remote Direct Memory Access) over Thunderbolt 5 connections, reducing inter-device latency by up to 99% compared to standard network communication. This feature is particularly impactful for the prefill phase of inference, where large amounts of data must be exchanged between devices. RDMA support requires Apple Silicon devices with Thunderbolt 5 ports, including M4 Pro Mac Mini and M4 Max MacBook Pro. ## Broad Model Support Exo supports running a wide range of models through HuggingFace integration, including Qwen3-235B at 8-bit quantization, DeepSeek v3.1 671B at 8-bit, Kimi K2 Thinking at 4-bit, and various Llama models. The MLX backend handles inference with optimized memory management, while the system automatically determines how to partition models based on available cluster resources. ## OpenAI-Compatible API The framework exposes a REST API compatible with the OpenAI chat completions format at localhost:52415. This means existing applications built against the OpenAI API can switch to local Exo inference by simply changing the base URL, with no code modifications required. A built-in web dashboard provides real-time cluster management and model interaction capabilities. ## Platform Support and Installation Exo runs on macOS with Apple Silicon and Linux. macOS users can install via a native .dmg application requiring macOS Tahoe 26.2 or later, or build from source using uv as the package manager. Linux support currently runs on CPU with GPU support under active development. The project follows XDG Base Directory specifications for configuration management on Linux.