Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Page Agent is an open-source GUI agent from Alibaba that lives directly inside your web page and controls its interface through natural language. With more than 20,000 GitHub stars, it takes a deliberately lightweight approach to web automation: instead of a browser extension, a headless browser, or a Python backend, everything runs as in-page JavaScript. A user types or speaks an instruction such as "Click the login button" and the agent carries it out within the live DOM. ## Text-Based DOM Control The defining design choice is that Page Agent reads and manipulates the page through its text-based DOM rather than screenshots. Because it never sends images to the model, it does not need a multimodal LLM or special screen-capture permissions, which keeps latency and token costs low and avoids the brittleness of pixel-level vision. The agent reasons over the structured elements that already exist in the page and acts on them directly. ## Easy Integration and Bring-Your-Own LLM Developers can add Page Agent with a single script tag for a quick evaluation or install it from npm for production use. It is model-agnostic — you supply your own OpenAI-compatible endpoint and API key — so teams keep control over which LLM powers the experience. An optional Chrome extension extends tasks across multiple browser tabs, and a beta MCP server lets external agent clients drive the page from outside. ## Use Cases The project targets practical product scenarios: shipping an in-app AI copilot without rewriting the backend, collapsing twenty-click ERP or CRM workflows into a single sentence, and making any web app more accessible through voice or natural-language commands. For builders of larger web agents, Page Agent can also act as the in-page execution layer that actually clicks, types, and navigates. ## Considerations Because it operates on the DOM, Page Agent works best on conventional HTML interfaces and may struggle with heavily canvas-rendered or non-semantic UIs. Reliability still depends on the quality of the LLM you connect and on clear, well-structured markup. The MCP server and some advanced features are marked beta. Even so, for teams that want to add a natural-language control layer to an existing web product with minimal infrastructure, Page Agent is one of the most pragmatic open-source options available.
OpenClaw is an open-source, local-first AI gateway with 366K GitHub stars that routes AI responses through WhatsApp, Telegram, Slack, Discord, iMessage, Teams, and 15+ other platforms — zero cloud dependency.
OpenClaw
Open-source personal AI assistant connecting to 13+ messaging platforms with local gateway architecture, voice support, and multi-agent routing.