Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
WhisperLive is an open-source, nearly-live transcription application built on OpenAI's Whisper automatic speech recognition model. Maintained by Collabora under an MIT license, it provides a client-server architecture that streams audio to a Whisper backend and returns text in near real time, working with both live microphone input and pre-recorded audio files. ## Why WhisperLive Matters Whisper produces high-quality transcriptions but is designed around batch processing of complete audio files, which makes it awkward for live use cases such as captioning, dictation, or meeting notes. WhisperLive fills that gap by wrapping Whisper in a streaming server that incrementally transcribes audio as it arrives. With over 4,000 GitHub stars, it has become a common building block for developers who need real-time speech-to-text without depending on a hosted API. ## Multiple Inference Backends A defining characteristic of WhisperLive is backend flexibility. The server can run on faster-whisper for efficient CPU and GPU inference, on NVIDIA TensorRT-LLM for accelerated GPU throughput, or on OpenVINO for optimized execution on Intel hardware. This lets teams match the deployment to their available hardware, from a laptop CPU to a datacenter GPU, without changing the client code. ## Real-Time Features WhisperLive goes beyond basic streaming with a set of features aimed at production transcription. It supports word-level timestamps for precise alignment, custom vocabulary and hotwords to improve recognition of domain-specific terms, and speaker diarization to attribute speech to different speakers. It also handles batch inference and raw PCM audio input, giving developers control over how audio is fed into the pipeline. ## Client, Server, and Integrations The project exposes an OpenAI-compatible REST interface alongside its native WebSocket client, easing integration into existing tooling. Browser extensions for Chrome and Firefox allow live transcription of audio playing in the browser, and an official Docker image simplifies running the server in containerized environments. These integration points make WhisperLive usable both as a standalone tool and as a component inside larger applications. ## Deployment Flexibility WhisperLive can be installed directly from pip as the whisper-live package or run from source, and the server accepts configuration for the number of concurrent clients and maximum connection time. This makes it straightforward to host a shared transcription endpoint for multiple users while bounding resource usage, which is useful for self-hosted captioning or dictation services. ## Considerations Because WhisperLive is a near-live rather than fully synchronous system, there is an inherent trade-off between latency and accuracy that depends on the chosen model size and backend. Lower-latency configurations on smaller models reduce transcription quality, while the most accurate setups require capable GPUs and additional setup such as building a Whisper-TensorRT engine. Speaker diarization and other advanced features add configuration complexity that teams should account for when planning a deployment.
ggml-org
Pure C/C++ port of OpenAI Whisper for edge deployment
SYSTRAN
A CTranslate2-based reimplementation of OpenAI's Whisper that runs up to 4x faster at the same accuracy with lower memory, adding 8-bit quantization, batched inference, and word-level timestamps. MIT-licensed and FFmpeg-free.