Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
OpenWhispr is the MIT-licensed open-source dictation and voice-to-text desktop app positioned as the free alternative to WisprFlow and Granola. With 3,600+ GitHub stars and v1.7.2 released on May 20, 2026, the project gives macOS, Windows, and Linux users a global hotkey dictation surface, local-model meeting transcription, and an MCP-integrated voice agent without the per-seat SaaS pricing of the commercial equivalents. ## What OpenWhispr Actually Does The app sits in the menu bar (or its platform equivalent) and listens for a global hotkey. When the user holds the key and speaks, the audio is transcribed and the resulting text is inserted into whatever application is in focus, which is the workflow popularized by WisprFlow. Around that core OpenWhispr adds three additional surfaces: automatic transcription of Zoom, Teams, and FaceTime meetings; voice agent conversations with named assistants; and a notes system with semantic search over everything the user has dictated or recorded. ## Local Models: Whisper and NVIDIA Parakeet The project supports both OpenAI Whisper and NVIDIA Parakeet as local on-device models. Parakeet is the differentiating choice: it is dramatically faster than Whisper on NVIDIA GPUs and competitive with cloud APIs on accuracy, which makes it a practical default for users on supported hardware. For users without a GPU, Whisper variants from tiny through large are bundled, and OpenWhispr handles the model download and caching. When local models are used the app's privacy claim that audio never leaves the device is technically enforced rather than marketing copy. ## Cloud Provider Support with BYOK For users who prefer cloud inference, OpenWhispr supports a bring-your-own-key configuration against GPT-5, Claude, Gemini, and Groq. This is most useful for the voice agent layer rather than raw transcription, because the agent's reasoning quality depends on the underlying model. The BYOK model means the user pays the provider directly and the app does not act as a billing middleman, which is the structural difference from WisprFlow's subscription model. ## Meeting Transcription with Speaker ID Automatic meeting transcription captures Zoom, Teams, and FaceTime calls and produces a transcript with on-device speaker identification. The speaker ID is computed locally rather than through a cloud diarization API, which matters for organizations with data residency or confidentiality requirements that block uploading meeting audio to a third party. Transcripts land in the notes system and become searchable alongside dictated notes. ## Voice Agent with MCP Named voice assistants let the user have conversational interactions with a configured LLM. The agent layer exposes the Model Context Protocol, which means the assistant can call MCP servers (file systems, calendars, third-party tools) during the conversation rather than being limited to chat-only responses. This is the feature that pushes OpenWhispr from a dictation app into a desktop voice agent, and it is the area where the open-source positioning matters most because users can audit and extend the tool set. ## Public API A public API exposes the transcription, notes, and agent surfaces to other applications. For users building custom workflows this means OpenWhispr can serve as the voice input layer for a homegrown automation stack rather than being a closed dictation app. ## Limitations The local Parakeet path requires an NVIDIA GPU, which excludes the large base of Apple Silicon and CPU-only users from the headline performance numbers; those users fall back to Whisper variants and should expect comparable performance to other Whisper-based desktop apps. Cross-platform parity is uneven because macOS, Windows, and Linux each have different audio routing stacks for meeting capture, and Linux users in particular have reported more configuration work for Zoom transcription. The MCP-based agent is only as capable as the configured MCP servers and the chosen LLM, so out-of-the-box value is bounded by what the user wires in. Finally, the app's storage of audio, transcripts, and embeddings grows quickly for heavy users; there is no automatic retention policy, so users need to manage their own cleanup. The voice agent's named assistant personalities are presets rather than a fully tunable persona system, which limits enterprise customization compared to commercial alternatives. Within those caveats, OpenWhispr is the most complete open-source voice-to-text desktop app in 2026 and the only one that combines local dictation, meeting transcription, and an MCP voice agent in a single MIT-licensed package.