Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

Grounding DINO - Open Source | Evermx | Evermx

Back to Open Source

Trending

Grounding DINO

IDEA-ResearchApache-2.0

View on GitHub

Vision10.3K Stars1.0K Forks2 views

Grounding DINO is an open-set object detector from IDEA-Research that finds objects described by arbitrary natural-language text instead of a fixed list of categories. Released as the official implementation of the ECCV 2024 paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection," it has become one of the most widely referenced open-vocabulary detection projects on GitHub, with more than 10,000 stars. The repository ships PyTorch code and pretrained weights, and the model is also integrated into Hugging Face Transformers for convenient use. ## How It Works The core idea is in the title: it marries the Transformer-based DINO detector with grounded pre-training that aligns image regions with language. A user supplies an image and a text prompt — a list of class names or a free-form phrase — and the model returns bounding boxes for the matching objects. Because detection is driven by text rather than a closed label set, the same model can locate categories it was never explicitly trained to detect, which is the defining property of open-set (open-vocabulary) detection. ## Capabilities Grounding DINO reports strong zero-shot results on standard benchmarks, including COCO, LVIS, and ODinW (Object Detection in the Wild), and supports referring expression comprehension, where objects are selected by descriptive phrases rather than single nouns. This flexibility makes it useful well beyond classic detection: a common pattern is automated data labeling, where the model proposes boxes from text prompts that humans then verify, dramatically reducing annotation cost. ## Ecosystem The project sits at the center of a broader toolchain. It is the detection front end for Grounded SAM and Grounded SAM 2, which pair it with Meta's Segment Anything models to turn text prompts into segmentation masks and open-world object tracking. The team also released Grounding DINO 1.5 as a more capable successor, and the original model is available through Hugging Face, Colab demos, and Roboflow tutorials, lowering the barrier to experimentation. ## Considerations The public repository reflects a research codebase: setup involves building CUDA extensions, and the main branch has not seen frequent updates since the 1.5 line and Hugging Face integration arrived, so many users now access the model through Transformers instead. As an open-vocabulary detector, accuracy varies with how prompts are phrased, and very fine-grained or ambiguous descriptions can be hit or miss. Even so, for teams that need flexible, prompt-driven detection — or a foundation for segmentation and tracking pipelines — Grounding DINO remains a landmark, Apache-2.0 licensed reference implementation.

Key Features

Open-set detection: finds objects from arbitrary text prompts, not a fixed category list
Marries the Transformer-based DINO detector with grounded language pre-training
Strong zero-shot performance on COCO, LVIS, and ODinW benchmarks
Referring expression comprehension (detect by descriptive phrases)
Pretrained checkpoints with Colab and Hugging Face Space demos
Integrated into Hugging Face Transformers for easy use
Foundation for Grounded SAM / Grounded SAM 2 segmentation and tracking
Widely used for automated, prompt-driven dataset annotation

Related Projects

TrendingVision

GitHub

108.4K12.6K

ComfyUI

Comfy-Org

GPL-3.0186

Open Source

Grounding DINO

Key Features

Tags

Related Projects

ComfyUI

PaddleOCR

Ultralytics YOLO

Roboflow Supervision