Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

PageAgent - Open Source | Evermx | Evermx

Back to Open Source

TrendingFeatured

PageAgent

AlibabaMIT

View on GitHub

Agent1.4K Stars127 Forks572 views

Alibaba has released PageAgent, a JavaScript-based GUI agent that lives directly inside web pages and enables natural language control of web interfaces. Unlike traditional web automation tools that rely on browser extensions, headless browsers, or screenshot-based approaches, PageAgent operates entirely within the page using text-based DOM manipulation. This approach eliminates the need for multimodal LLMs, OCR, or external dependencies. Released under the MIT license, PageAgent reached version 1.5.2 and is currently trending on GitHub with over 1,400 stars and 137 new stars gained in a single day. ## Key Features ### Pure In-Page JavaScript Architecture PageAgent's most distinctive characteristic is its architectural decision to run entirely within the browser page. There is no Python backend, no headless browser process, and no screenshot pipeline. The agent interacts with the DOM directly through text-based analysis, which makes it significantly lighter and faster than alternatives that depend on visual processing. ### Flexible LLM Backend Support PageAgent supports a bring-your-own-model approach. Developers can connect any LLM backend, including Alibaba's Qwen, OpenAI, or self-hosted models, through a simple configuration object. This flexibility means teams are not locked into a single AI provider. ```javascript const agent = new PageAgent({ model: 'qwen3.5-plus', baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1', apiKey: 'YOUR_API_KEY', language: 'en-US' }) await agent.execute('Click the login button') ``` ### Human-in-the-Loop UI For enterprise and production scenarios, PageAgent includes an interactive approval workflow. Before the agent executes potentially destructive actions, it can present a confirmation dialog to the user. This addresses a critical trust barrier in autonomous web agents. ### Multi-Page Automation via Chrome Extension While the core library operates within a single page, an optional Chrome extension enables cross-tab automation. This allows PageAgent to coordinate actions across multiple browser tabs, making it suitable for complex workflows that span several pages. ## Practical Use Cases | Use Case | Description | Benefit | |----------|-------------|--------| | SaaS AI Copilot | Embed directly in products with no backend rewrite | Ship AI features fast | | Smart Form Filling | Turn 20-click workflows into one sentence | ERP, CRM, admin systems | | Accessibility | Natural language commands for any web app | Voice commands, screen readers | | Browser Automation | Cross-tab coordination via extension | Complex multi-page workflows | ## Technical Architecture PageAgent is built as a TypeScript monorepo (82.3% TypeScript, 11.0% JavaScript, 6.2% CSS) with 652 commits on the main branch. The DOM processing pipeline derives components from the browser-use project (MIT Licensed), adapted for the in-page execution model. Installation is straightforward via npm: ```bash npm install page-agent ``` ## Conclusion PageAgent represents a pragmatic approach to web GUI agents. By staying inside the page and avoiding the overhead of screenshots, OCR, and external browsers, it achieves a level of simplicity and performance that makes it immediately practical for product teams. The flexible LLM backend support and human-in-the-loop design make it a strong candidate for enterprise deployments where trust and control are non-negotiable.

Key Features

Pure in-page JavaScript execution with no browser extension or headless browser required
Text-based DOM manipulation without screenshots, OCR, or multimodal LLMs
Flexible LLM backend support including Qwen, OpenAI, and self-hosted models
Human-in-the-loop approval UI for enterprise safety
Optional Chrome extension for multi-page cross-tab automation
Simple npm installation and minimal integration overhead
TypeScript monorepo architecture with comprehensive documentation

Related Projects

TrendingAgent

GitHub

366.0K75.2K

OpenClaw

OpenClaw

MIT502

Open Source

PageAgent

Key Features

Tags

Related Projects

OpenClaw

OpenClaw

Superpowers

Hermes Agent