Trending

SenseVoice

FunAudioLLMNOASSERTION

STT7.9K Stars716 Forks292 views

SenseVoice is a speech foundation model from Alibaba's FunAudioLLM team that delivers multiple speech understanding capabilities including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). Trained with over 400,000 hours of data and supporting more than 50 languages, it surpasses Whisper in recognition performance while running 7x faster than Whisper-small and 17x faster than Whisper-large. The model family includes SenseVoice-Small for low-latency 5-language ASR and SenseVoice-Large for high-precision 50+ language support.

Key Features

High-precision multilingual ASR supporting 50+ languages with Whisper-beating accuracy
Speech emotion recognition (SER) for detecting speaker emotional states
Audio event detection (AED) for identifying non-speech audio events
7x faster than Whisper-small and 17x faster than Whisper-large inference speed
Spoken language identification (LID) for automatic language detection

Open Source

SenseVoice

Key Features

Tags

Related Projects

whisper.cpp

Handy

faster-whisper

WhisperX