Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

TRELLIS - Open Source | Evermx | Evermx

Back to Open Source

Trending

TRELLIS

microsoftMIT

View on GitHub

3D12.9K Stars1.3K Forks84 views

## Introduction TRELLIS is a large 3D asset generation model from Microsoft that turns text or image prompts into high-quality 3D assets. Released alongside the CVPR 2025 Spotlight paper "Structured 3D Latents for Scalable and Versatile 3D Generation," the project has gathered nearly 13,000 GitHub stars and positioned itself as one of the most capable open 3D generation systems available. It is notable not just for output quality but for its ability to emit several different 3D representations from a single unified model. ## Architecture The cornerstone of TRELLIS is a unified Structured LATent (SLAT) representation. Instead of committing to one output geometry format, SLAT encodes a 3D asset in a latent space that can be decoded into multiple representations, including Radiance Fields, 3D Gaussians, and meshes. On top of this representation, TRELLIS uses Rectified Flow Transformers as its generative backbone, which provide the powerful, scalable modeling needed to produce detailed and consistent geometry. The model is provided as large-scale pre-trained checkpoints with up to 2 billion parameters. ## Training Data TRELLIS was trained on a large 3D asset dataset of 500,000 diverse objects. Microsoft has also released the TRELLIS-500K dataset along with toolkits for data preparation, giving researchers the resources to reproduce results or build their own 3D generation pipelines. According to the authors, TRELLIS significantly surpasses existing methods, including recent approaches at similar scales. ## Key Capabilities ### High-Quality Generation TRELLIS produces diverse 3D assets with intricate shape and texture detail, aiming for output quality suitable for downstream creative and production use rather than rough prototypes. ### Format Versatility Because the SLAT representation decodes to multiple formats, a single generation can be exported as Radiance Fields, 3D Gaussians, or meshes, accommodating different rendering engines and downstream requirements without retraining. ### Flexible Editing The model supports editing of generated assets, including producing variants of the same object and performing local edits on a 3D asset, capabilities the authors note were not offered by previous models at this scale. ### Text and Image Conditioning TRELLIS accepts both image and text prompts. The image-conditioned models are the primary path, and the project also released TRELLIS-text models; the authors recommend generating images from text first and then using the image models for the best detail. ## Installation and Requirements TRELLIS currently targets Linux and requires an NVIDIA GPU with at least 16GB of memory, with the code verified on A100 and A6000 GPUs. It depends on the CUDA Toolkit (tested with 11.8 and 12.2) for compiling certain submodules, and Python 3.8 or higher. Conda is recommended for managing the environment, and the installer can create a dedicated `trellis` environment with PyTorch and the required attention backends such as flash-attn or xformers. A Gradio demo and example scripts are provided, and a live demo is hosted on Hugging Face Spaces. ## Why It Matters 3D generation has lagged behind image and video generation in both quality and usability, often locking users into a single output format tied to a specific renderer. TRELLIS addresses both problems at once: it raises the quality bar for open 3D generation and, through its SLAT representation, decouples generation from output format so the same model can serve game engines, graphics pipelines, and research workflows. By releasing large pre-trained models, the training code, and the 500K-object dataset, Microsoft has given the community a strong, reproducible foundation for further 3D work. ## Limitations The hardware requirements are steep: a 16GB-plus NVIDIA GPU is necessary, and the code is tested only on Linux, with Windows support unverified. Installation is involved, requiring CUDA toolkit compilation of submodules and careful version matching. The authors caution that text-conditioned generation is less creative and detailed than image-conditioned generation because of data limitations, which is why they recommend a text-to-image-to-3D pipeline. As with all generative 3D models, results can still require manual cleanup before they are production-ready.

Key Features

Generates high-quality 3D assets from text or image prompts (CVPR'25 Spotlight)
Unified Structured LATent (SLAT) representation decoding to Radiance Fields, 3D Gaussians, and meshes
Rectified Flow Transformer backbones for scalable, detailed generation
Large-scale pre-trained checkpoints with up to 2 billion parameters
Trained on the released TRELLIS-500K dataset of 500,000 diverse objects
Flexible editing including object variants and local 3D edits
Gradio demo, example scripts, and a Hugging Face Spaces live demo
MIT licensed with released training code and data-prep toolkits