Open Source

Oxford VGG and Meta AI's CVPR 2025 Best Paper: a feed-forward transformer that infers camera poses, depth, point maps, and 3D tracks from images in seconds.

VGGT License (Custom, Meta)13

TrendingAgent

GitHub

57.2K10.0K

OpenManus

FoundationAgents

An MIT-licensed, open-source general AI agent from the MetaGPT team — run an autonomous, browser- and code-using assistant with your own LLM keys, no invite code.

Nano-vLLM

GeeeekExplorer

A from-scratch vLLM reimplementation in ~1,200 lines of readable Python with prefix caching, tensor parallelism, torch.compile, and CUDA graphs — vLLM-comparable offline speed.

Chatterbox

resemble-ai

Resemble AI's open-source TTS family with zero-shot voice cloning from seconds of audio, a low-latency Turbo model, and 23+ language multilingual synthesis (MIT).

MiniCPM-V

OpenBMB

OpenBMB's efficient multimodal LLM series that runs image and video understanding on phones — MiniCPM-V 4.6 hits GPT-4V-class quality at just 1.3B params (Apache-2.0).

WhisperX

m-bain

A fast open-source ASR pipeline on top of Whisper: word-level timestamps via forced alignment, 70x realtime batched inference, and speaker diarization (BSD-2).

DSPy

stanfordnlp

Stanford NLP's framework for programming — not prompting — language models: declare tasks as typed Python modules and let optimizers auto-tune prompts, few-shot examples, and weights (MIT).

Speech To Speech

huggingface

Hugging Face's open-source voice-agent pipeline (VAD -> STT -> LLM -> TTS) exposed through an OpenAI Realtime-compatible WebSocket API, with every component swappable and self-hostable (Apache-2.0).

Claude Video (/watch)

bradautomates

An MIT-licensed /watch command that lets an AI agent 'watch' any video — captions first, then yt-dlp + ffmpeg frame extraction and Whisper fallback — grounding answers in what's actually on screen.

video-use

browser-use

MIT-licensed open-source pipeline that edits videos through a coding agent — drop in raw footage, chat with Claude Code, and get a finished cut with filler-word removal, color grading, subtitles, and animation overlays.

OfficeCLI

iOfficeAI

Apache-2.0 Office suite built for AI agents — read, edit, and automate Word, Excel, and PowerPoint from a single dependency-free binary, with a built-in HTML/PNG rendering engine that lets agents visually verify output.

Orca

stablyai

Open-source Agentic Development Environment (ADE) that runs Codex, Claude Code, OpenCode, and Pi in parallel — each in its own git worktree — with a mobile companion, on macOS/Windows/Linux (MIT).

MIT17

1 2 3 4 5 6 7 8 9

625 projects

Sort:

Trending3D

GitHub

13.8K1.5K

VGGT

facebookresearch

Oxford VGG and Meta AI's CVPR 2025 Best Paper: a feed-forward transformer that infers camera poses, depth, point maps, and 3D tracks from images in seconds.

VGGT License (Custom, Meta)13

TrendingAgent

GitHub

57.2K10.0K

OpenManus

FoundationAgents

An MIT-licensed, open-source general AI agent from the MetaGPT team — run an autonomous, browser- and code-using assistant with your own LLM keys, no invite code.

Nano-vLLM

GeeeekExplorer

A from-scratch vLLM reimplementation in ~1,200 lines of readable Python with prefix caching, tensor parallelism, torch.compile, and CUDA graphs — vLLM-comparable offline speed.

Chatterbox

resemble-ai

Resemble AI's open-source TTS family with zero-shot voice cloning from seconds of audio, a low-latency Turbo model, and 23+ language multilingual synthesis (MIT).

MiniCPM-V

OpenBMB

OpenBMB's efficient multimodal LLM series that runs image and video understanding on phones — MiniCPM-V 4.6 hits GPT-4V-class quality at just 1.3B params (Apache-2.0).

WhisperX

m-bain

A fast open-source ASR pipeline on top of Whisper: word-level timestamps via forced alignment, 70x realtime batched inference, and speaker diarization (BSD-2).

DSPy

stanfordnlp

Stanford NLP's framework for programming — not prompting — language models: declare tasks as typed Python modules and let optimizers auto-tune prompts, few-shot examples, and weights (MIT).

Speech To Speech

huggingface

Hugging Face's open-source voice-agent pipeline (VAD -> STT -> LLM -> TTS) exposed through an OpenAI Realtime-compatible WebSocket API, with every component swappable and self-hostable (Apache-2.0).