Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

WhisperKit (Argmax Open-Source SDK) - Open Source | Evermx | Evermx

Back to Open Source

Trending

WhisperKit (Argmax Open-Source SDK)

argmaxincMIT

View on GitHub

STT6.2K Stars576 Forks1 views

WhisperKit is Argmax's Swift implementation of OpenAI's Whisper for fully on-device speech-to-text on Apple Silicon, and it now ships as the flagship component of the broader Argmax Open-Source SDK repository (argmax-oss-swift), which bundles WhisperKit alongside TTSKit (on-device text-to-speech via Qwen-TTS) and SpeakerKit (speaker diarization via Pyannote) as separate importable library products in one Swift package. The repository has reached 6,200+ GitHub stars and was pushed as recently as this week, making it one of the more actively maintained on-device speech projects in the current ecosystem, with the underlying WhisperKit CoreML models pulling regular monthly downloads on Hugging Face. ## Speech-to-Text Without a Server WhisperKit's core promise is that transcription runs entirely on-device via CoreML — no audio leaves the phone or Mac, and there is no per-request API cost. A three-line Swift snippet initializes the pipeline and transcribes a local audio file (wav, mp3, m4a, or flac), with WhisperKit auto-selecting an appropriately sized Whisper variant for the device if none is specified. Argmax recommends the compressed `large-v3-v20240930_626MB` build for maximum multilingual accuracy on iOS and macOS, and a `tiny` build purely for fast local debugging. ## Three Kits, One Package Beyond transcription, TTSKit adds on-device speech synthesis built on Qwen-TTS, including custom voice selection, real-time streaming playback, and style-instruction controls on the 1.7B variant, while SpeakerKit adds diarization that can be combined with WhisperKit's transcription output and exported in standard RTTM format. All three ship as independent Swift Package Manager products under one `ArgmaxOSS` umbrella target, so an app can pull in only the capability it needs rather than the full suite. ## Beyond the Swift Package The project also ships a Homebrew-installable CLI (`brew install whisperkit-cli`) for quick command-line transcription, and a local server mode with documented REST endpoints and generated client bindings, letting non-Swift services on the same machine or network call into on-device Whisper without adopting Swift directly. A companion `whisperkittools` repository lets teams fine-tune their own Whisper variants and publish them to Hugging Face in CoreML format for direct use through WhisperKit's model-repo override. ## Open-Source vs. Pro SDK Argmax positions this repository as the open-source tier of its offering: a separate, commercial Argmax Pro SDK layers on real-time transcription with live speaker attribution, custom-vocabulary accuracy tuning, a local server for non-native apps, and Android support via a Kotlin SDK. Teams evaluating WhisperKit for production should be aware the free tier's roadmap and feature ceiling are deliberately capped relative to the paid product. ## Limitations The SDK requires macOS 14+ and Xcode 16+, so it is not usable for older-OS deployment targets, and — unlike the Pro SDK — it lacks live streaming transcription with speaker labels out of the box. Because the repository was recently restructured from a standalone `WhisperKit` repo into the multi-kit `argmax-oss-swift` monorepo, projects that reference the old repository path will need to update their Swift Package Manager dependency URL. ## Who Should Use This WhisperKit is the strongest choice for iOS/macOS developers who need offline, privacy-preserving speech recognition (and optionally TTS or diarization) without shipping audio to a cloud API, particularly for note-taking, meeting-transcription, or accessibility apps where on-device latency and data locality matter more than the extra accuracy ceiling of a hosted Whisper API.

Key Features

Fully on-device Whisper transcription via CoreML — no audio sent to any server
Auto-selects an appropriately sized Whisper model per device, or pick a specific variant
Bundled TTSKit (Qwen-TTS on-device synthesis) and SpeakerKit (Pyannote diarization) in one package
Homebrew CLI and local REST server mode for non-Swift integrations
whisperkittools companion repo for fine-tuning and publishing custom CoreML Whisper models
Actively maintained: pushed within the past week, 6,200+ stars