Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Exo is an open-source framework that lets you run frontier AI models locally by connecting multiple everyday devices into a distributed AI cluster. Rather than requiring a single expensive GPU server, Exo pools compute resources across Apple Silicon Macs and other devices, enabling users to run models as large as DeepSeek v3.1 671B or Qwen3-235B from their own hardware. ## Why Exo Matters Running large language models locally has been constrained by individual device memory limits. A single Mac with 192GB of unified memory cannot load a 671B parameter model, but four Macs networked together can. Exo solves this distribution problem automatically, with zero manual configuration required. Devices running Exo discover each other on the network and form a cluster without any setup beyond launching the application. The project has rapidly gained traction with over 41,600 GitHub stars, reflecting strong demand for local AI inference solutions that preserve data privacy and eliminate API costs. ## Automatic Device Discovery Exo uses a peer-to-peer architecture where each device running the framework automatically detects other instances on the local network. There is no concept of a master or worker node. Any device in the cluster can initiate inference requests, and the system coordinates model distribution across all available resources transparently. ## Tensor Parallelism and Topology-Aware Distribution The framework implements tensor parallelism to shard models across devices, achieving measured speedups of 1.8x on 2 devices and 3.2x on 4 devices. The topology-aware auto-parallel system analyzes real-time device resources and network characteristics to determine the optimal distribution strategy. This means adding a device to the cluster immediately improves throughput without manual reconfiguration. ## RDMA Over Thunderbolt 5 Exo includes day-zero support for RDMA (Remote Direct Memory Access) over Thunderbolt 5 connections, reducing inter-device latency by up to 99% compared to standard network communication. This feature is particularly impactful for the prefill phase of inference, where large amounts of data must be exchanged between devices. RDMA support requires Apple Silicon devices with Thunderbolt 5 ports, including M4 Pro Mac Mini and M4 Max MacBook Pro. ## Broad Model Support Exo supports running a wide range of models through HuggingFace integration, including Qwen3-235B at 8-bit quantization, DeepSeek v3.1 671B at 8-bit, Kimi K2 Thinking at 4-bit, and various Llama models. The MLX backend handles inference with optimized memory management, while the system automatically determines how to partition models based on available cluster resources. ## OpenAI-Compatible API The framework exposes a REST API compatible with the OpenAI chat completions format at localhost:52415. This means existing applications built against the OpenAI API can switch to local Exo inference by simply changing the base URL, with no code modifications required. A built-in web dashboard provides real-time cluster management and model interaction capabilities. ## Platform Support and Installation Exo runs on macOS with Apple Silicon and Linux. macOS users can install via a native .dmg application requiring macOS Tahoe 26.2 or later, or build from source using uv as the package manager. Linux support currently runs on CPU with GPU support under active development. The project follows XDG Base Directory specifications for configuration management on Linux.

Shubhamsaboo
Collection of 100+ production-ready LLM apps with AI agents, RAG, voice agents, and MCP using OpenAI, Anthropic, Gemini, and open-source models
infiniflow
Leading open-source RAG engine with deep document understanding, grounded citations, and agent capabilities, with 73K+ GitHub stars.