Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
F5-TTS (A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching) is a high-quality, zero-shot text-to-speech system from Shanghai Jiao Tong University's X-LANCE Lab, built on a Diffusion Transformer with ConvNeXt V2 architecture trained using flow matching. It supports natural speech synthesis with voice cloning from a short reference audio clip and features Sway Sampling—an inference-time flow step sampling strategy that greatly improves generation quality and speed. The v1 base model released in March 2025 delivers better training stability and inference performance, achieving state-of-the-art naturalness on multiple benchmarks.
RVC-Boss
Open-source WebUI for few-shot and zero-shot voice cloning and text-to-speech, producing a usable voice from as little as a 5-second sample.
Microsoft
Microsoft's MIT-licensed open frontier voice AI: 1.5B long-form TTS up to 90 minutes with 4 speakers, 0.5B streaming TTS at 300 ms latency, and 7B ASR for 60-minute single-pass transcription. 47k+ stars.
2noise
A dialogue-optimized open TTS model trained on 100,000+ hours that adds fine-grained prosody — laughter, pauses, interjections — with multi-speaker, English/Chinese support.
OpenBMB
OpenBMB's tokenizer-free TTS system, now a 2B model trained on 2M+ hours across 30 languages with Voice Design, controllable cloning, and 48kHz audio.