Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Hunyuan3D-2 is Tencent's open-source high-resolution 3D asset generation system that creates fully textured 3D models from a single image or text prompt. With 13,200 GitHub stars, it has become the leading open-source solution for professional-grade 3D content creation, outperforming competing proprietary and open-source models on geometry quality, condition alignment, and texture fidelity benchmarks. ## Two-Stage Pipeline Architecture The core innovation of Hunyuan3D-2 is its two-stage generation architecture. The first stage, Hunyuan3D-DiT, uses a scalable diffusion transformer to generate clean 3D geometry from an input image or text. The second stage, Hunyuan3D-Paint, synthesizes high-quality textures onto the generated mesh. This decoupled design solves a fundamental challenge in 3D generation: shape and texture have very different geometric and appearance properties that are difficult to optimize simultaneously. The two-stage approach also provides practical flexibility: users can apply the Hunyuan3D-Paint texture model to their own hand-crafted meshes, not just those generated by the shape model. This enables texture synthesis as a standalone workflow for existing 3D assets. ## Model Variants for Different Use Cases Hunyuan3D-2 ships in multiple model sizes optimized for different scenarios. The standard model (1.1B parameters) balances quality and speed for general use. The mini variant (0.6B parameters) is designed for resource-constrained environments and faster inference. The multiview model (1.1B, multiview) supports multi-angle input for improved geometric accuracy. The Turbo models offer accelerated inference for production pipelines. The Hunyuan3D-2.1 release (June 2025) introduced physically-based rendering (PBR) texture synthesis as a major upgrade. Instead of generating standard RGB textures, the updated Paint model creates materials that simulate real-world light interaction, enabling metallic reflections and subsurface scattering effects that dramatically improve photorealism. ## Hardware Accessibility A key strength is the relatively modest hardware requirements. Shape generation runs on 6GB VRAM, putting it within reach of mid-range consumer GPUs. Full generation including texturing requires 16GB VRAM. This makes professional-quality 3D generation accessible without datacenter-grade hardware. The project supports Windows, macOS, and Linux, with community-contributed Windows portable versions that eliminate the need for manual Python environment setup. ## Deployment Flexibility Hunyuan3D-2 offers multiple integration paths: a Python API following Diffusers library conventions for programmatic use, a Gradio web interface for interactive experimentation, a REST API server for production deployment, and a Blender addon for integration with the industry-standard 3D modeling tool. The official web interface at 3d.hunyuan.tencent.com provides cloud-based access without local setup. ComfyUI integration allows node-based workflow composition popular in AI art communities. ## Performance Benchmarks On standardized benchmarks evaluating geometry quality (Chamfer Distance), condition alignment (CLIP score), and texture fidelity (FID, CMMD), Hunyuan3D-2 outperforms comparable open-source models and matches or exceeds proprietary solutions. The introduction of PBR texturing in version 2.1 improved texture quality metrics further, with CLIP-FiD decreasing from 26.44 to 24.78. ## Community and Ecosystem The project has developed an active community with Discord and WeChat groups. The official Hunyuan3D Studio platform provides mesh manipulation and animation tools built on top of the generation models. The open-source release includes complete model weights on HuggingFace and full inference code, enabling researchers to study and extend the architecture. ## Limitations Generation quality degrades for complex multi-object scenes; the system works best with single objects or simple compositions. The full pipeline is computationally intensive, with generation times of several minutes on consumer hardware. Text-to-3D quality lags behind image-to-3D because the image provides richer geometric constraints than text descriptions. Some fine geometric details such as thin structures or small holes can be lost during mesh generation.

microsoft
Microsoft's CVPR'25 Spotlight 3D generation model converting text/images to high-quality 3D assets with up to 2B parameters
stepfun-ai
Open-source framework for high-fidelity and controllable textured 3D asset generation from text or images using a two-stage VAE-DiT architecture