Trending

Higgs Audio

boson-aiApache-2.0

Audio8.0K Stars612 Forks144 views

Higgs Audio is an expressive audio foundation model from Boson AI, pretrained on over 10 million hours of audio and text data, capable of generating natural multi-speaker dialogues, melodic humming with cloned voices, and simultaneous speech with background music. The latest V2.5 release condenses the architecture to 1B parameters while surpassing the prior 3B model in speed and accuracy through Group Relative Policy Optimization (GRPO) alignment on a curated Voice Bank dataset. It achieves state-of-the-art results on EmergentTTS-Eval, outperforming GPT-4o-mini-TTS on expressive emotion and intonation tasks.

Key Features

Pretrained on 10M+ hours of audio and text data for rich acoustic and language understanding
Emergent multi-speaker dialogue generation in multiple languages
True-to-life zero-shot voice cloning with 3-10 seconds of reference audio
Simultaneous speech and background music generation
V2.5: 1B parameter model surpassing prior 3B model via GRPO alignment

Open Source

Higgs Audio

Key Features

Tags

Related Projects

RVC (Retrieval-based Voice Conversion WebUI)

OpenVoice

Voicebox

Ultimate Vocal Remover GUI