Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Insanely Fast Whisper is an opinionated CLI tool for transcribing audio files using OpenAI's Whisper model on local devices, optimized for extreme speed through Hugging Face Transformers, Optimum, and Flash Attention 2. On NVIDIA A100 GPUs, it can process 150 minutes (2.5 hours) of audio in less than 98 seconds using Whisper Large v3 — approximately 92x real-time transcription speed. Supports multiple Whisper model variants including openai/whisper-large-v3 and distil-whisper for speed-optimized processing. Features include FP16 precision for halved memory footprint, configurable batch processing (default 24), BetterTransformer optimization, automatic language detection or manual specification, speaker diarization via Pyannote.audio integration for identifying different speakers, and both chunk-level and word-level timestamp generation for subtitle creation. Supports NVIDIA GPUs with full Flash Attention 2, Apple Silicon via MPS backend, and CPU fallback. Installation via pipx with simple CLI usage. Originally created to showcase Transformers benchmarks, evolved into a community-driven production transcription utility.