Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Higgs Audio is an expressive audio foundation model from Boson AI, pretrained on over 10 million hours of audio and text data, capable of generating natural multi-speaker dialogues, melodic humming with cloned voices, and simultaneous speech with background music. The latest V2.5 release condenses the architecture to 1B parameters while surpassing the prior 3B model in speed and accuracy through Group Relative Policy Optimization (GRPO) alignment on a curated Voice Bank dataset. It achieves state-of-the-art results on EmergentTTS-Eval, outperforming GPT-4o-mini-TTS on expressive emotion and intonation tasks.
myshell-ai
Instant voice cloning framework by MIT and MyShell with 36k+ GitHub stars, enabling zero-shot cross-lingual voice replication from just seconds of reference audio.
FunAudioLLM
Multilingual LLM-based TTS with zero-shot voice cloning, 9 languages, and 150ms streaming latency.
OpenBMB
OpenBMB's 2B-parameter tokenizer-free TTS model with 48 kHz output, 30-language support, voice cloning, and an Apache-2.0 license.
Alibaba Cloud Qwen Team
Open-source TTS series from Alibaba's Qwen team with 97ms streaming latency, 10-language support, 3-second voice cloning, and natural-language voice design. Apache-2.0 licensed.