Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Higgs Audio is an expressive audio foundation model from Boson AI, pretrained on over 10 million hours of audio and text data, capable of generating natural multi-speaker dialogues, melodic humming with cloned voices, and simultaneous speech with background music. The latest V2.5 release condenses the architecture to 1B parameters while surpassing the prior 3B model in speed and accuracy through Group Relative Policy Optimization (GRPO) alignment on a curated Voice Bank dataset. It achieves state-of-the-art results on EmergentTTS-Eval, outperforming GPT-4o-mini-TTS on expressive emotion and intonation tasks.
FunAudioLLM
Multilingual LLM-based TTS with zero-shot voice cloning, 9 languages, and 150ms streaming latency.
speechbrain
Comprehensive PyTorch speech toolkit supporting 16+ tasks from ASR to TTS with 200+ training recipes
Vaibhavs10
Blazing-fast Whisper transcription CLI processing 150 minutes of audio in 98 seconds with Flash Attention 2
pipecat-ai
Open-source Python framework for real-time voice and multimodal conversational AI with 20+ STT/TTS/LLM providers and 10.6K+ GitHub stars.