Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

Insanely Fast Whisper - Open Source | Evermx | Evermx

Back to Open Source

Trending

Insanely Fast Whisper

Vaibhavs10Apache-2.0

View on GitHub

Audio11.3K Stars820 Forks245 views

Insanely Fast Whisper is an opinionated CLI tool for transcribing audio files using OpenAI's Whisper model on local devices, optimized for extreme speed through Hugging Face Transformers, Optimum, and Flash Attention 2. On NVIDIA A100 GPUs, it can process 150 minutes (2.5 hours) of audio in less than 98 seconds using Whisper Large v3 — approximately 92x real-time transcription speed. Supports multiple Whisper model variants including openai/whisper-large-v3 and distil-whisper for speed-optimized processing. Features include FP16 precision for halved memory footprint, configurable batch processing (default 24), BetterTransformer optimization, automatic language detection or manual specification, speaker diarization via Pyannote.audio integration for identifying different speakers, and both chunk-level and word-level timestamp generation for subtitle creation. Supports NVIDIA GPUs with full Flash Attention 2, Apple Silicon via MPS backend, and CPU fallback. Installation via pipx with simple CLI usage. Originally created to showcase Transformers benchmarks, evolved into a community-driven production transcription utility.

Key Features

92x real-time transcription speed: 150 minutes of audio in under 98 seconds on A100
Flash Attention 2 optimization for reduced memory usage and accelerated computation
Multiple Whisper model support including large-v3 and distil-whisper variants
Speaker diarization via Pyannote.audio for identifying different speakers
Word-level and chunk-level timestamp generation for subtitle creation
FP16 precision and configurable batch processing for GPU memory optimization
Cross-platform support: NVIDIA GPUs, Apple Silicon MPS, and CPU fallback
Simple CLI installation via pipx with automatic language detection