Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

ViMax - Open Source | Evermx | Evermx

Back to Open Source

Trending

ViMax

HKUDSMIT

View on GitHub

Agent8.4K Stars1.3K Forks87 views

ViMax is HKUDS's agentic video generation system that packages a director, screenwriter, producer, and video generator into one orchestrated multi-agent pipeline. With 8,300+ GitHub stars and MIT licensing, it is one of the most ambitious open attempts to fix the core problem with current AI video: most systems generate a single high-quality clip of a few seconds and stop, while ViMax is built to plan, storyboard, and assemble minute-scale narrative content from a single idea, novel, or screenplay. ## The Multi-Agent Architecture Unlike single-shot text-to-video tools, ViMax decomposes video creation into role-specific agents that hand off structured artifacts. A screenwriter agent ingests an idea, novel chapter, or script and produces a shot-aware screenplay with character bibles and environment descriptions. A director agent breaks that screenplay into a storyboard with cinematography decisions, camera angles, and pacing notes. A producer agent handles resource coordination, scheduling shot generation in parallel, and managing reference-image selection so that the same character looks consistent across shots. The video generator agent then runs the final visual synthesis against the chosen backend. A central orchestration layer handles agent scheduling, stage transitions, resource management, and retry/fallback logic, which is what keeps the pipeline from collapsing when a single shot fails. ## Backend-Pluggable, Not Tied to One Model The framework treats foundation models as swappable backends configured through YAML. Chat reasoning runs on Google Gemini, OpenRouter, or MiniMax (M2.7 and M2.5 variants are documented, with up to 1M tokens of context for novel-length inputs). Image generation defaults to Nano Banana via Google's API. Video generation runs through Veo. This means ViMax is not betting on any single video model winning; if a stronger video backend ships, you swap the config and keep the rest of the pipeline. ## Four Input Modes ViMax exposes four named workflows. Idea2Video takes a raw concept ('a samurai discovers a glowing tree at midnight') and runs the full pipeline. Novel2Video chunks literary content into episodes, with RAG-based retrieval keeping character and place facts consistent across chapters. Script2Video takes an existing screenplay and gives the user direct control over the director's interpretation. AutoCameo lets users insert a personal photo as a character that recurs consistently across scenes, which is unusual in open video frameworks and powers most of the social-media demo content. ## How It Beats Single-Shot Tools Most open video models, including the strongest ones, generate around 5 to 10 seconds at a time and lose character consistency between clips. ViMax explicitly addresses this by selecting reference images per shot, validating character/environment continuity between shots, and assembling many short generations into structured longer sequences. It is the assembly layer, not a new video model, and that is the right level of abstraction for the current state of the field. Parallel shot generation is a real performance win. When multiple shots share the same camera position and environment, ViMax batches them, which both speeds up production and improves visual consistency by running them under identical conditioning. ## Output Artifacts The system emits frame sequences, individual shot clips, an assembled final video, and processing logs that document agent decisions per stage. The logs are themselves useful because they expose where the director chose specific shot lengths or where the producer selected one reference image over another, making it possible to inspect and rerun individual stages. ## Where It Fits ViMax is the right pick when the goal is narrative video at minute-scale rather than a single eye-catching short clip. It is also useful as a reference architecture for anyone building their own multi-agent media pipeline because the agent boundaries are clean and the YAML-driven backend configuration is easy to repoint. ## Limitations ViMax depends on commercial APIs (Gemini, Veo, Nano Banana) for the heavy generation, so it is not fully self-hosted out of the box, and total cost scales with output length. The output quality is bounded by whichever video backend is plugged in, and Veo's per-shot capabilities still cap what a longer assembled video can look like. Character consistency, while better than naked single-shot generation, is not perfect across very long sequences, especially when face geometry is challenging. And the Python 3.12 / UV-managed environment is opinionated, so dropping ViMax into existing Python infrastructure may require some setup work.

Key Features

Multi-agent pipeline with director, screenwriter, producer, and video generator roles
Central orchestration with agent scheduling, stage transitions, and retry/fallback logic
Backend-pluggable: Gemini/OpenRouter/MiniMax for chat, Nano Banana for images, Veo for video
Four workflows: Idea2Video, Novel2Video, Script2Video, and AutoCameo for personal-photo cameos
RAG-based script generation for novel-length content with automatic chapter segmentation
Parallel shot generation with shared-camera batching for speed and consistency
Reference-image selection per shot for character and environment continuity
YAML-driven configuration and Python 3.12 / UV environment, MIT licensed

Related Projects

TrendingAgent

GitHub

366.0K75.2K

OpenClaw

OpenClaw

MIT493

Open Source

ViMax

Key Features

Tags

Related Projects

OpenClaw

OpenClaw

Superpowers

Hermes Agent