Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Handy is a free, open-source, and extensible speech-to-text application that runs completely offline. Built as a cross-platform desktop app under the MIT license, it does one job well: press a keyboard shortcut, speak, and have your words appear in whatever text field you are using — all without sending audio to the cloud. The project has struck a clear nerve, gathering more than 24,000 GitHub stars by filling the gap for a genuinely open, privacy-respecting dictation tool that anyone can fork and extend. ## Press, Speak, Paste The core interaction is deliberately minimal. You press a configurable shortcut (or use push-to-talk), speak while it is active, and release to have Handy transcribe and paste the result directly into the active application. There is no account, no subscription, and no separate window to manage — the tool is designed to disappear into the workflow and behave like a system-level dictation feature. That simplicity is the point: as the project puts it, Handy isn't trying to be the best speech-to-text app, it's trying to be the most forkable one. ## Fully Local Transcription Everything happens on the user's own computer. Incoming audio is first cleaned with Silero-based Voice Activity Detection (VAD) to filter out silence, then passed to a local speech-recognition model. Because no audio ever leaves the machine, Handy is well suited to anyone with privacy, compliance, or offline requirements — journalists, clinicians, developers on air-gapped systems, or users who simply prefer not to stream their voice to a third-party service. The privacy posture is structural rather than a policy promise: the data path has no cloud component. ## Whisper and Parakeet Models Handy lets users choose the transcription engine that fits their hardware. Whisper models (Small, Medium, Turbo, and Large) provide strong multilingual accuracy with GPU acceleration when available, while NVIDIA's Parakeet V3 offers a CPU-optimized path with excellent performance and automatic language detection. This flexibility means the same app can run lightweight on a modest laptop or scale up to high-accuracy transcription on a workstation with a discrete GPU, without locking the user into a single model. ## Built to Be Forked Under the hood, Handy is a Tauri application that combines a React, TypeScript, and Tailwind CSS settings interface with a Rust backend for system integration, audio processing, and ML inference. It leans on a focused set of Rust crates — whisper-rs and transcribe-rs for recognition, cpal for cross-platform audio, vad-rs for voice detection, rdev for global shortcuts, and rubato for resampling. The result is a small, native footprint and an architecture that is explicitly designed to be extended: there is a Raycast extension, command-line flags for controlling a running instance, a debug mode, and distribution via direct downloads, Homebrew cask, and winget. ## Considerations Handy's single-minded simplicity is also its boundary. It focuses on transcription and pasting rather than offering the rich editing, formatting, or command features of some commercial dictation suites, and large Whisper models benefit significantly from a capable GPU, so accuracy and latency on lower-end CPU-only machines depend on choosing the right model. As a fast-moving community project it grants system permissions for microphone and accessibility access that users should review, and some conveniences — such as the Homebrew and winget packages — are community-maintained rather than official. For anyone who wants a private, offline, and hackable speech-to-text tool that stays out of the way, though, Handy delivers an unusually clean and extensible experience.
ggml-org
Pure C/C++ port of OpenAI Whisper for edge deployment
SYSTRAN
A CTranslate2-based reimplementation of OpenAI's Whisper that runs up to 4x faster at the same accuracy with lower memory, adding 8-bit quantization, batched inference, and word-level timestamps. MIT-licensed and FFmpeg-free.