Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

RVC (Retrieval-based Voice Conversion WebUI) - Open Source | Evermx | Evermx

Back to Open Source

Trending

RVC (Retrieval-based Voice Conversion WebUI)

RVC-ProjectMIT

View on GitHub

Audio36.3K Stars5.1K Forks1 views

RVC is a VITS-based voice-conversion framework built to change the timbre of a source recording into a target speaker's voice, and it has become the de facto open-source foundation for the AI-cover and voice-changer community. The project has accumulated 36,000+ GitHub stars and over 5,100 forks, and its pretrained models — trained on roughly 50 hours of the open, license-clean VCTK corpus — are freely reusable without the copyright ambiguity that plagues voice models trained on scraped celebrity audio. ## Retrieval Is the Key Idea Most voice-conversion systems feed a source speaker's extracted features directly into a decoder conditioned on the target voice, which tends to leak residual timbre from the source speaker into the output. RVC instead performs a top-1 nearest-neighbor retrieval against a feature index built from the target speaker's training data, replacing the source features with the closest matching target features before synthesis. This retrieval step is the project's core contribution and is what gives RVC its name — it measurably reduces timbre leakage compared to purely generative approaches. ## Fast, Low-Data Training RVC is explicitly optimized for accessibility: it trains usable models on as little as 10 minutes of clean target-voice audio, and training itself completes quickly even on modest consumer GPUs. A model-fusion feature in the checkpoint-processing tab lets users blend two trained voice models to interpolate timbre, and the project bundles UVR5 vocal-separation models directly in the pipeline so users can strip instrumentals from a song before conversion without a separate tool. ## Pitch Extraction and Real-Time Mode The project ships RMVPE, a pitch-extraction algorithm from INTERSPEECH 2023, as its recommended F0 estimator — it eliminates the dulled/muffled-voice artifacts common with older pitch trackers like CREPE while running faster and lighter. Beyond the standard train-and-infer web UI (`go-web`), RVC includes a dedicated real-time conversion GUI (`go-realtime-gui`) that achieves end-to-end latency around 170ms, dropping to roughly 90ms with ASIO-compatible audio hardware, making live voice-changer use viable for streaming and calls. ## Hardware Flexibility Unlike many voice-AI projects that assume an NVIDIA GPU, RVC ships separate requirements files for AMD (ROCm on Linux, DirectML on Windows) and Intel (IPEX) graphics cards, plus a documented CPU fallback path, which meaningfully widens who can actually run it. ## Limitations The upstream repository's own commit history has slowed — the most recent push predates its still-large and active fork/derivative ecosystem, so day-to-day improvements now largely happen in community forks rather than the base project. Voice conversion technology also carries clear misuse potential for impersonation and non-consensual deepfakes; the project's own documentation and community norms lean on the clean VCTK base model to avoid direct copyright issues, but downstream users training on a specific real person's voice bear responsibility for consent and applicable law. Setup remains more involved than a single-command install, requiring several pretrained assets (HuBERT base model, RMVPE weights, UVR5 weights, ffmpeg) to be downloaded separately before first use. ## Who Should Use This RVC is the standard starting point for hobbyists and small studios building AI singing covers, dubbing pipelines, or live voice-changer setups who want a mature, well-documented, low-data-requirement conversion framework with an active surrounding community, rather than a bleeding-edge but less-battle-tested alternative.

Key Features

Top-1 feature retrieval against target-speaker index to reduce source timbre leakage
Trains a usable voice-conversion model from as little as 10 minutes of audio
RMVPE pitch extraction (INTERSPEECH 2023) for cleaner F0 tracking than CREPE
Real-time conversion GUI with ~170ms end-to-end latency (~90ms with ASIO)
Built-in UVR5 vocal/instrumental separation and checkpoint model-fusion tools
Cross-vendor GPU support: NVIDIA, AMD (ROCm/DirectML), Intel (IPEX), and CPU fallback