Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Parlor is an open-source desktop assistant that brings real-time, fully on-device multimodal AI to consumer hardware. Built on Google's Gemma 4 E2B vision-language model and the Kokoro TTS engine, with Apple MLX and LiteRT-LM backends, Parlor lets a user hold a natural voice and vision conversation with an AI that never leaves the local machine. Released under Apache 2.0 by independent developer Fikri Karim in April 2026, the project has crossed 1,700+ GitHub stars and 200+ forks in roughly six weeks. ## What Parlor Is Most voice assistants in 2026 either stream audio to a cloud endpoint or run a single modality locally. Parlor unifies speech recognition, vision understanding, language reasoning, and speech synthesis into a single on-device loop that runs end-to-end on a modern Mac or Windows laptop. You can speak to it, show it your screen or a webcam frame, and hear it answer back in a natural voice, all without a network connection and with latency that approaches commercial hosted services. ## Modality Pipeline The pipeline stitches together best-of-breed open components rather than reinventing each one. A local speech-recognition stage transcribes user speech in real time. Vision frames are fed to Gemma 4 E2B, a 2-billion-effective-parameter multimodal model that handles image and video understanding on-device. The text response is generated by the same model, then streamed to Kokoro, a compact open TTS engine, which produces natural speech in real time. On Apple Silicon, MLX accelerates the language model. On other hardware, LiteRT-LM, Google's lightweight runtime, takes over. The orchestration glue is written in Python and ships as a small desktop UI. ## Why On-Device Multimodal Matters The privacy implications of sending continuous voice and camera streams to a cloud service are unappealing for many users. Parlor sidesteps that entirely. There are no API keys, no per-minute billing, and no telemetry. For accessibility users who want a voice-driven screen-reading companion, for developers prototyping multimodal agents without burning through API credits, and for any environment with offline or air-gapped constraints, Parlor is one of the first projects to make a compelling end-to-end demonstration that this is now possible on a laptop. ## Hardware and Performance Apple Silicon is the first-class target. On an M2 or newer Mac with 16GB of unified memory, Parlor delivers near-real-time turn-taking with Gemma 4 E2B in MLX. On Windows and Linux machines, LiteRT-LM provides a fallback path, with performance scaling to available CPU and GPU resources. The project's design assumes a consumer laptop rather than a workstation: there is no expectation of a discrete GPU, and the Kokoro TTS engine is specifically chosen for its tiny footprint and fast synthesis. ## Use Cases The project's natural fits are voice-driven coding assistants that can see the screen, hands-free productivity for users with mobility constraints, language-learning practice partners that run offline, and rapid prototyping of multimodal agent UX without paying per-token costs to a hosted API. Because the entire stack is open and modular, individual components can be swapped out: a different vision model, a different TTS voice, or a different reasoning model can be dropped in without rewriting the orchestration. ## Limitations Parlor's quality is bounded by the underlying open models. Gemma 4 E2B is excellent for its parameter class but still trails the largest hosted multimodal models on complex reasoning and on long-context vision tasks. Kokoro produces natural speech but does not match the most expressive proprietary voices. Real-time performance on non-Apple-Silicon hardware varies substantially with CPU class and is not yet guaranteed on older laptops. Native mobile builds are not part of the project today, and Windows packaging is less polished than the macOS path. As a young project from a single primary maintainer, ongoing maintenance is a risk for production deployments.
hacksider
Real-time AI face swap and one-click video deepfake with only a single image
harry0703
AI-powered short video generator that automates scripting, footage sourcing, subtitles, and composition — supporting 10+ LLM providers and batch production.