Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

MLX-Audio - Open Source | Evermx | Evermx

Back to Open Source

TrendingFeatured

MLX-Audio

BlaizzyMIT

View on GitHub

Audio6.2K Stars478 Forks449 views

MLX-Audio is a comprehensive text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) library built on Apple's MLX framework, optimized for fast and efficient audio processing on Apple Silicon. With over 6,200 GitHub stars and 383 commits from 33 contributors, it has become the go-to audio AI library for the Apple ecosystem. ## Why MLX-Audio Matters Running speech models locally on Apple Silicon has historically required converting models from PyTorch and dealing with performance bottlenecks. MLX-Audio solves this by providing native MLX implementations of popular speech models, delivering faster inference with lower memory usage compared to generic frameworks. For developers building voice-enabled applications on Mac, iPhone, or iPad, MLX-Audio eliminates the need for cloud API calls and their associated latency and costs. ## Key Features ### Multi-Task Audio Processing MLX-Audio supports three core audio tasks in a single unified library: | Task | Models | Capabilities | |------|--------|--------------| | TTS | Kokoro, Qwen3-TTS, CSM, Dia, OuteTTS, Spark, Chatterbox, Soprano, Ming Omni TTS | Multilingual synthesis, voice cloning, speed control | | STT | Whisper, Qwen3-ASR, Parakeet, Voxtral, VibeVoice-ASR, Canary, Moonshine, MMS | Transcription, timestamps, long-form audio (up to 60 min) | | STS | Speech-to-speech models | Direct voice transformation | ### Voice Cloning and Customization The CSM (Conversational Speech Model) enables voice cloning from reference audio samples. Users can generate speech in a target voice with minimal reference data. Additional customization includes adjustable speech speed control and multilingual voice switching across supported models. ### Quantization Support MLX-Audio supports model quantization at multiple bit levels: 3-bit, 4-bit, 6-bit, and 8-bit. This enables running larger models on devices with limited memory while maintaining acceptable quality. Quantized models load faster and consume less RAM, making them practical for mobile and embedded applications. ### OpenAI-Compatible REST API The library includes a built-in REST API server that follows OpenAI's API format. This means applications already using OpenAI's speech APIs can switch to MLX-Audio with minimal code changes, running entirely locally without API costs. ### Interactive Web Interface MLX-Audio ships with a web-based UI featuring 3D audio visualization. Users can test TTS and STT models interactively through the browser, making it easy to evaluate different models and configurations without writing code. ### Speaker Diarization The Sortformer v1 and v2.1 voice activity detection models provide speaker diarization capabilities, identifying and separating different speakers in multi-speaker audio. This is essential for meeting transcription, podcast processing, and interview analysis. ## Cross-Platform SDK Beyond the Python library, MLX-Audio provides: - **Python package**: `pip install mlx-audio` for scripting and integration - **CLI tools**: `mlx-audio-generate` and `mlx-audio-ui` for command-line usage - **Swift package**: `mlx-audio-swift` for native iOS/macOS application development The Swift SDK enables direct integration into Xcode projects, making it straightforward to add on-device speech capabilities to iOS and macOS apps. ## Recent Development Active development continues with recent additions including Ming Omni TTS model support, expanded STT capabilities with VibeVoice-ASR for long-form audio with diarization, and Qwen3-TTS and Qwen3-ASR integrations. The project maintains regular releases with responsive issue tracking. ## Limitations - Requires Apple Silicon (M-series chips) and cannot run on Intel Macs or non-Apple hardware - Some models may produce lower quality output compared to cloud-based alternatives - Limited documentation for advanced use cases like fine-tuning custom voices - Swift SDK is newer and has fewer features than the Python library ## Conclusion MLX-Audio fills a critical gap in the Apple Silicon AI ecosystem by providing a unified, high-performance library for all speech-related tasks. Its combination of broad model support, quantization options, and cross-platform SDKs makes it the most complete on-device audio AI solution available for Apple hardware. For developers building privacy-focused or offline-capable voice applications, MLX-Audio is the clear choice on Apple platforms.

Key Features

Unified TTS, STT, and STS processing on Apple Silicon with 9+ TTS and 8+ STT model architectures
Voice cloning via CSM with minimal reference audio and adjustable speech speed control
Model quantization support from 3-bit to 8-bit for memory-efficient deployment
OpenAI-compatible REST API for drop-in replacement of cloud speech services
Interactive web interface with 3D audio visualization for model testing
Speaker diarization with Sortformer VAD models for multi-speaker audio
Cross-platform SDKs: Python pip package, CLI tools, and Swift package for iOS/macOS

Related Projects

TrendingAudio

GitHub

36.2K4.0K

OpenVoice

myshell-ai

MIT252

Open Source

MLX-Audio

Key Features

Tags

Related Projects

OpenVoice

Ultimate Vocal Remover GUI

CosyVoice

VoxCPM