Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

Seed-VC - Open Source | Evermx | Evermx

Back to Open Source

Trending

Seed-VC

PlachtaaGPL-3.0

View on GitHub

Audio3.8K Stars505 Forks1 views

Seed-VC is an open-source voice conversion project that can clone a voice from a short reference clip without any training. Given 1 to 30 seconds of reference speech, it converts a source recording so it sounds like the target speaker, and the same framework also handles singing voice conversion. With a real-time mode and a permissive set of pretrained models, it has gathered a sizable following on GitHub and a public Hugging Face demo. ## What It Does The project centers on three zero-shot capabilities: standard voice conversion, real-time voice conversion, and singing voice conversion. "Zero-shot" means no per-speaker training is required — the model reads a reference voice at inference time and transfers its timbre onto the input speech. This makes it practical for one-off conversions where collecting a training dataset would be impractical, while preserving the linguistic content and, for singing, the melody of the original. ## Real-Time Conversion A standout feature is low-latency streaming conversion, with a reported algorithm delay of roughly 300ms and an additional device-side delay near 100ms. That budget is tight enough for online meetings, gaming, and live streaming, where conversions must happen continuously rather than as an offline batch step. A dedicated lightweight model (around 25M parameters) is tuned specifically for this real-time path. ## Models and Fine-Tuning Seed-VC ships several checkpoints for different trade-offs, from the tiny real-time model up to larger offline and singing-focused variants (around 98M and 200M parameters), plus a V2 line. The architecture uses a diffusion-transformer (DiT) design with content encoders such as Whisper and XLSR and neural vocoders like BigVGAN. For users who want higher fidelity on specific speakers, optional fine-tuning is supported with strikingly low requirements — as little as one utterance per speaker and roughly 100 training steps, which the authors report finishing in about two minutes on a T4 GPU. ## Practical Use Installation targets Python 3.10 across Windows, Linux, and Apple Silicon Macs, with separate requirement files per platform and an optional compile path for extra speed on V2 models. A Hugging Face Space provides a no-install way to try conversions, and the repository links demos and objective evaluations comparing it with earlier voice conversion baselines. ## Considerations The project is licensed under GPL-3.0, which carries copyleft obligations that commercial integrators should review carefully. As with all voice cloning technology, the ability to mimic a voice from seconds of audio raises clear consent and misuse concerns, and responsible use is essential. Quality also depends on the reference clip and the chosen model size, so some experimentation is expected. For developers exploring zero-shot voice or singing conversion — especially with real-time needs — Seed-VC is a capable and actively documented option.

Key Features

Zero-shot voice conversion from a 1–30s reference clip, no per-speaker training
Zero-shot singing voice conversion (SVC)
Real-time conversion with ~300ms algorithm delay for meetings, gaming, streaming
Optional fine-tuning from as little as 1 utterance / ~100 steps (~2 min on a T4)
Multiple model sizes, from a 25M real-time model to ~200M singing-focused checkpoints
Diffusion-transformer (DiT) architecture with Whisper/XLSR encoders and BigVGAN vocoder
Cross-platform support for Windows, Linux, and Apple Silicon Macs
Public Hugging Face demo and objective evaluation comparisons

Related Projects

TrendingAudio

GitHub

36.2K4.0K

OpenVoice

myshell-ai

MIT321

Open Source

Seed-VC

Key Features

Tags

Related Projects

OpenVoice

Voicebox

Ultimate Vocal Remover GUI

Audiocraft