Trending

Amphion

open-mmlabMIT

Audio9.7K Stars798 Forks172 views

Amphion is an open-source toolkit by OpenMMLab for audio, music, and speech generation research and production. It covers a broad range of generation tasks including text-to-speech, voice conversion, singing voice synthesis, text-to-audio, and vocoder training, with support for architectures like VALL-E, NaturalSpeech2, MaskGCT, and Vevo for zero-shot capabilities. Pre-trained models are available on HuggingFace and ModelScope, making it accessible for both researchers and engineers.

Key Features

Supports TTS, voice conversion, singing synthesis, and text-to-audio generation tasks
Implements cutting-edge architectures: VALL-E, NaturalSpeech2, MaskGCT, Vevo, and FastSpeech2
Neural audio codecs (DualCodec, FACodec) for discrete speech token extraction
Comprehensive evaluation metrics covering F0, energy, intelligibility, and speaker similarity
Pre-trained models on HuggingFace and ModelScope with Docker support for easy deployment

Open Source

Amphion

Key Features

Tags

Related Projects

OpenVoice

Ultimate Vocal Remover GUI

Audiocraft

CosyVoice