Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

WhisperX - Open Source | Evermx | Evermx

Back to Open Source

Trending

WhisperX

m-bainBSD-2-Clause

View on GitHub

STT20.9K Stars2.2K Forks366 views

WhisperX is a high-performance automatic speech recognition (ASR) library that dramatically extends OpenAI's Whisper with word-level timestamps and multi-speaker diarization. Built on top of the faster-whisper backend and CTranslate2 runtime, it achieves transcription speeds up to 70x faster than real-time using the Whisper large-v2 model — all within a lightweight footprint of under 8GB GPU memory. The project's standout feature is its forced alignment pipeline powered by wav2vec2, which corrects Whisper's notoriously imprecise token-level timestamps to word-level granularity. This is critical for subtitle generation, legal transcription, media indexing, and any downstream task requiring precise timing. Voice Activity Detection (VAD) preprocessing is applied before transcription to segment audio intelligently, which significantly reduces hallucination artifacts — a common pain point with vanilla Whisper on noisy or silent audio. Multi-speaker support is achieved through integration with pyannote-audio, an industry-leading speaker diarization toolkit. WhisperX can tag each transcribed word with the speaker's identity (SPEAKER_00, SPEAKER_01, etc.), making it invaluable for meeting transcription, podcast processing, interview analysis, and conversational AI pipelines. WhisperX supports multiple languages with automatic alignment model selection based on detected language. It handles batched inference efficiently, making it suitable for large-scale audio processing workloads. The BSD-2-Clause license makes it freely usable for both research and commercial applications. With over 20,900 GitHub stars and 2,200 forks, WhisperX has become one of the most widely adopted Whisper extensions in the open-source community. Researchers, media companies, and developers routinely choose it when accurate timestamps or speaker attribution are required beyond what base Whisper provides.