Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

ACE-Step 1.5 - Open Source | Evermx | Evermx

Back to Open Source

Trending

ACE-Step 1.5

ace-stepMIT

View on GitHub

Audio11.0K Stars1.3K Forks56 views

ACE-Step 1.5 is an open-source music generation foundation model from the ACE-Step team that aims to bring commercial-grade text-to-music synthesis to consumer hardware. Released under the permissive MIT license, the project has gathered 11,026 GitHub stars and 1,333 forks, making it one of the most popular open-source audio AI projects of 2026. Rather than positioning itself as a single model, ACE-Step 1.5 ships as a full local music studio — text-to-music, cover generation, audio repainting, track separation, and LoRA fine-tuning — that runs across NVIDIA, AMD, Intel, and Apple Silicon devices. ## A Planner-Plus-Diffusion Architecture ACE-Step 1.5 pairs a language-model planner with a Diffusion Transformer (DiT) decoder. The LM component, built on Qwen3 and available in 0.6B, 1.7B, and 4B parameter sizes, handles composition planning and metadata synthesis — deciding structure, BPM, key, and time signature before any audio is rendered. The DiT decoder then synthesizes the waveform, with a standard 2B-parameter model offered in base, SFT, and turbo variants, plus an XL series at 4B parameters that targets higher audio quality at roughly 9GB of VRAM. Inference runs anywhere from 8 to 50 diffusion steps depending on the variant, letting users trade speed against fidelity. ## Speed and Hardware Reach The headline claim is speed on accessible hardware. On an NVIDIA A100, ACE-Step 1.5 reportedly generates a full song in under two seconds, and on a consumer RTX 3090 in under ten seconds. A quantization-and-offload path drops the minimum VRAM requirement to around 4GB, and batch generation can produce up to eight songs simultaneously. Crucially, the project does not restrict itself to CUDA: it supports NVIDIA CUDA, AMD ROCm, Apple Metal via MLX, Intel XPU, and a CPU fallback, across Windows, Linux, and macOS on Python 3.11-3.12. This breadth of hardware support is unusual for a music model and is a meaningful part of its appeal to hobbyists who lack data-center GPUs. ## More Than Text-to-Music Where ACE-Step 1.5 distinguishes itself is the breadth of tasks beyond simple prompting. It supports cover generation, audio repainting (regenerating a region of an existing track), and track separation, alongside reference-audio guidance and explicit metadata control over BPM, key, and time signature. Multi-track generation and vocal-to-accompaniment conversion let users build arrangements rather than one-shot clips, and generation length is flexible from 10 seconds up to 10 minutes. Multilingual lyric support spans 50-plus languages, broadening its reach well beyond English-centric tools. ## Fine-Tuning and Personalization For users who want a particular sound, ACE-Step 1.5 supports LoRA fine-tuning from user-provided audio samples. This lets creators adapt the model toward a specific style, instrument palette, or vocal character without retraining the foundation model from scratch. Combined with the metadata controls, this fine-tuning path turns the project from a novelty generator into a tool that can be steered toward a consistent creative identity — an important distinction for musicians and producers evaluating it for real work. ## Interfaces and Ecosystem The project is designed to meet users wherever they work. It ships with a Gradio web UI (served at localhost:7860), a REST API, a Python API, a command-line interface, and even a VST3 plugin for integration into digital audio workstations. The web UI auto-configures model selection based on available VRAM, with tiers from 6GB up to 24GB-plus, lowering the setup burden for newcomers. Installation is handled through the modern uv package manager — clone, `uv sync`, then `uv run acestep` — and the project lists ecosystem partners including ComfyUI, Zilliz, and Milvus. A free, no-GPU online demo at acemusic.ai lets prospective users try the model before committing local resources. ## Pros, Cons, and Practical Considerations The strengths are clear: genuinely broad hardware support, fast generation, an MIT license that permits commercial use, and a deep feature set covering generation, editing, and separation. The trade-offs are equally honest. Music generation quality is subjective and varies by genre and prompt, the larger XL and 4B models demand meaningful VRAM, and Python 3.11-3.12 is required, which can complicate environments pinned to other versions. The project itself flags legal risk: generated output can resemble copyrighted material, and the maintainers explicitly ask users to verify originality and secure permissions for protected styles. That candor is welcome, but it places real responsibility on anyone deploying the output commercially. ## Outlook With over 11,000 stars, an arXiv technical report, and an unusually wide hardware and interface story, ACE-Step 1.5 represents the maturing of local, open-source music generation. The combination of an LM planner and a DiT decoder, MIT licensing, and support for AMD, Intel, and Apple hardware positions it as a default starting point for creators who want commercial-grade music tooling without a cloud dependency or a copyleft license. As consumer GPUs continue to gain memory, the gap between ACE-Step 1.5 and closed commercial services looks set to narrow further through 2026.

Key Features

Planner-plus-diffusion design: Qwen3-based LM planner (0.6B/1.7B/4B) with a Diffusion Transformer decoder
Text-to-music generation with 50+ language lyric support
Cover generation, audio repainting, and track separation in one tool
Cross-platform GPU support: NVIDIA CUDA, AMD ROCm, Apple Metal (MLX), Intel XPU, and CPU fallback
Sub-2-second song generation on an A100; under 10 seconds on an RTX 3090
LoRA fine-tuning from user audio samples for personalized styles
Flexible duration from 10 seconds to 10 minutes with BPM/key/time-signature control
Multiple interfaces: Gradio web UI, REST API, Python API, CLI, and a VST3 plugin