Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
WhisperKit is Argmax's Swift implementation of OpenAI's Whisper for fully on-device speech-to-text on Apple Silicon, and it now ships as the flagship component of the broader Argmax Open-Source SDK repository (argmax-oss-swift), which bundles WhisperKit alongside TTSKit (on-device text-to-speech via Qwen-TTS) and SpeakerKit (speaker diarization via Pyannote) as separate importable library products in one Swift package. The repository has reached 6,200+ GitHub stars and was pushed as recently as this week, making it one of the more actively maintained on-device speech projects in the current ecosystem, with the underlying WhisperKit CoreML models pulling regular monthly downloads on Hugging Face. ## Speech-to-Text Without a Server WhisperKit's core promise is that transcription runs entirely on-device via CoreML — no audio leaves the phone or Mac, and there is no per-request API cost. A three-line Swift snippet initializes the pipeline and transcribes a local audio file (wav, mp3, m4a, or flac), with WhisperKit auto-selecting an appropriately sized Whisper variant for the device if none is specified. Argmax recommends the compressed `large-v3-v20240930_626MB` build for maximum multilingual accuracy on iOS and macOS, and a `tiny` build purely for fast local debugging. ## Three Kits, One Package Beyond transcription, TTSKit adds on-device speech synthesis built on Qwen-TTS, including custom voice selection, real-time streaming playback, and style-instruction controls on the 1.7B variant, while SpeakerKit adds diarization that can be combined with WhisperKit's transcription output and exported in standard RTTM format. All three ship as independent Swift Package Manager products under one `ArgmaxOSS` umbrella target, so an app can pull in only the capability it needs rather than the full suite. ## Beyond the Swift Package The project also ships a Homebrew-installable CLI (`brew install whisperkit-cli`) for quick command-line transcription, and a local server mode with documented REST endpoints and generated client bindings, letting non-Swift services on the same machine or network call into on-device Whisper without adopting Swift directly. A companion `whisperkittools` repository lets teams fine-tune their own Whisper variants and publish them to Hugging Face in CoreML format for direct use through WhisperKit's model-repo override. ## Open-Source vs. Pro SDK Argmax positions this repository as the open-source tier of its offering: a separate, commercial Argmax Pro SDK layers on real-time transcription with live speaker attribution, custom-vocabulary accuracy tuning, a local server for non-native apps, and Android support via a Kotlin SDK. Teams evaluating WhisperKit for production should be aware the free tier's roadmap and feature ceiling are deliberately capped relative to the paid product. ## Limitations The SDK requires macOS 14+ and Xcode 16+, so it is not usable for older-OS deployment targets, and — unlike the Pro SDK — it lacks live streaming transcription with speaker labels out of the box. Because the repository was recently restructured from a standalone `WhisperKit` repo into the multi-kit `argmax-oss-swift` monorepo, projects that reference the old repository path will need to update their Swift Package Manager dependency URL. ## Who Should Use This WhisperKit is the strongest choice for iOS/macOS developers who need offline, privacy-preserving speech recognition (and optionally TTS or diarization) without shipping audio to a cloud API, particularly for note-taking, meeting-transcription, or accessibility apps where on-device latency and data locality matter more than the extra accuracy ceiling of a hosted Whisper API.
ggml-org
Pure C/C++ port of OpenAI Whisper for edge deployment
CJ Pais
A free, open-source, cross-platform speech-to-text app that transcribes your voice entirely offline — press a shortcut, speak, and have the text pasted into any app.