Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

ViMax - Open Source | Evermx | Evermx

Back to Open Source

TrendingFeatured

ViMax

HKUDS (Hong Kong University of Data Science)MIT

View on GitHub

Multimodal5.5K Stars932 Forks74 views

ViMax is an open-source agentic video generation framework from HKUDS (Hong Kong University of Data Science) that orchestrates Director, Screenwriter, Producer, and Video Generator agents into a single end-to-end pipeline. Released under the MIT license, the project has accumulated 5.5k GitHub stars and 932 forks, climbing the trending charts as creators discover its ability to convert raw ideas, scripts, or even entire novels into minute-to-hour-length AI-generated videos with cross-scene continuity. ## The Long-Form Video Problem Individual diffusion video models (Veo, Sora, Kling) generate stunning 8-second clips, but stringing them into coherent long-form narratives requires production work that has remained stubbornly manual: writing scripts, designing characters, planning shots, tracking continuity, and assembling timelines. ViMax automates this entire production stack through specialist agents, addressing what HKUDS calls "the long-form video bottleneck." The result is a system where a one-sentence concept can become a 30-minute video with consistent characters and coherent storytelling. ## Four Specialist Agents The architecture mirrors a real production studio. The **Director** agent orchestrates multi-agent scheduling, stage transitions, and resource management — the conductor that keeps the workflow moving. The **Screenwriter** parses the input (idea, script, or novel), extracts characters and environments, identifies scene boundaries, and produces a shot-by-shot screenplay. The **Producer** handles storyboarding, shot planning, and visual asset coordination, including reference image selection to maintain character consistency. The **Video Generator** executes image synthesis, frame selection, and final timeline assembly into the rendered output. ## Four Production Modes ViMax exposes four entry points tuned to different creative starting positions. **Idea2Video** turns a raw concept into a complete video story through automated storytelling and character design. **Script2Video** accepts a full screenplay and renders unlimited video with creative control retained by the human writer. **Novel2Video** is the most ambitious mode: it ingests complete literary works, intelligently compresses the narrative, tracks characters across hundreds of shots, and outputs episodic video adaptations. **AutoCameo** personalizes generation by integrating user photos as consistent characters within creative scripts — a viral-ready feature for memes and tribute videos. ## RAG-Based Script Engine Long-form scripts require coherent character arcs and consistent worldbuilding that exceed the context windows of even frontier LLMs. ViMax addresses this with a retrieval-augmented script generation engine that indexes characters, locations, and plot threads, retrieving relevant context for each new scene generation. This is the technical breakthrough that enables novel-length adaptations without character drift or world-rule contradictions. ## Consistency and Continuity Maintaining the same character appearance across hundreds of independently-generated shots is the central technical challenge of agentic video. ViMax uses MLLM/VLM-based consistency verification: after each shot is generated, vision-language models compare it against reference images and prior shots, flagging continuity errors for regeneration. Reference image selection is automated based on scene requirements, and parallel shot generation processes sequential frames from the same virtual camera in batches for efficiency. ## Model Integration ViMax is designed as an orchestration layer rather than a model trainer. It calls Google Gemini-2.5-Flash-Lite and MiniMax-M2.7/M2.5 for chat and reasoning via OpenAI-compatible APIs, Google's Nanobanna API for image generation, and Google Veo for video synthesis. This composability means the framework can swap underlying models as new ones release, with the agent workflow continuing to deliver value. Setup requires Python 3.12+ on Linux or Windows, with the UV package manager handling dependencies.

Key Features

Four specialist agents (Director, Screenwriter, Producer, Video Generator) orchestrating end-to-end video production
Idea2Video, Script2Video, Novel2Video, and AutoCameo entry points for different creative starting points
RAG-based long-form script engine indexing characters and plot threads for novel-length adaptations
MLLM/VLM-based consistency verification to maintain character appearance across hundreds of shots
Automated reference image selection ensuring environmental and character continuity
Parallel shot generation for high-efficiency processing of sequential frames
Pluggable model backend supporting Gemini, MiniMax, Nanobanna image gen, and Google Veo video gen
Personalized cameo generation integrating user photos as consistent characters in scripts