Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
F5-TTS (A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching) is a high-quality, zero-shot text-to-speech system from Shanghai Jiao Tong University's X-LANCE Lab, built on a Diffusion Transformer with ConvNeXt V2 architecture trained using flow matching. It supports natural speech synthesis with voice cloning from a short reference audio clip and features Sway Sampling—an inference-time flow step sampling strategy that greatly improves generation quality and speed. The v1 base model released in March 2025 delivers better training stability and inference performance, achieving state-of-the-art naturalness on multiple benchmarks.
Microsoft
Microsoft's MIT-licensed open frontier voice AI: 1.5B long-form TTS up to 90 minutes with 4 speakers, 0.5B streaming TTS at 300 ms latency, and 7B ASR for 60-minute single-pass transcription. 47k+ stars.
OpenBMB
OpenBMB's tokenizer-free TTS system, now a 2B model trained on 2M+ hours across 30 languages with Voice Design, controllable cloning, and 48kHz audio.
microsoft
Open-source frontier voice AI for TTS and ASR
resemble-ai
Family of SoTA open-source TTS models by Resemble AI with zero-shot voice cloning, 23+ language support, and paralinguistic controls across 350M-500M parameter variants.