Trending

F5-TTS

SWividMIT

TTS14.3K Stars2.1K Forks175 views

F5-TTS (A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching) is a high-quality, zero-shot text-to-speech system from Shanghai Jiao Tong University's X-LANCE Lab, built on a Diffusion Transformer with ConvNeXt V2 architecture trained using flow matching. It supports natural speech synthesis with voice cloning from a short reference audio clip and features Sway Sampling—an inference-time flow step sampling strategy that greatly improves generation quality and speed. The v1 base model released in March 2025 delivers better training stability and inference performance, achieving state-of-the-art naturalness on multiple benchmarks.

Key Features

Zero-shot voice cloning from short reference audio clips with no fine-tuning required
Diffusion Transformer with ConvNeXt V2 for fast training and inference
Sway Sampling strategy for inference-time quality improvement
E2 TTS flat-UNet variant included as an alternative architecture
Supports NVIDIA, AMD, Intel GPU and MPS (Apple Silicon) inference

Open Source

F5-TTS

Key Features

Tags

Related Projects

GPT-SoVITS

VibeVoice

ChatTTS

VoxCPM2