Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
## Introduction RF-DETR is a real-time transformer architecture for object detection and instance segmentation developed by Roboflow. With 5,900+ GitHub stars, 715+ forks, and a dual Apache 2.0/PML 1.0 license, RF-DETR has established itself as a serious contender in the edge-deployable computer vision space. The model leverages a DINOv2 vision transformer backbone and delivers state-of-the-art accuracy-latency tradeoffs on both Microsoft COCO and RF100-VL benchmarks. Unlike traditional CNN-based detectors, RF-DETR brings the power of vision transformers to real-time inference scenarios while maintaining competitive latency with models like YOLO26 and D-FINE. ## Architecture and Performance RF-DETR is built on the DETR (DEtection TRansformer) paradigm but optimized for real-time deployment. The architecture uses a DINOv2 vision transformer backbone, which provides rich feature representations learned through self-supervised pre-training on large-scale image datasets. | Model | AP50 (COCO) | AP50:95 (COCO) | Latency | |-------|------------|----------------|--------| | RF-DETR-Nano | — | — | Ultra-low | | RF-DETR-L | 75.1 | 56.5 | 6.8ms | | RF-DETR-2XL | 78.5 | 60.1 | 17.2ms | | RF-DETR-Seg-L | 70.5 | 47.1 | 8.8ms | | RF-DETR-Seg-2XL | 73.1 | 49.9 | 21.8ms | The model family spans from Nano to 2XL variants, allowing developers to choose the right accuracy-speed tradeoff for their deployment target. The largest 2XL detection model achieves 60.1 AP on COCO, which is competitive with the best models available. ## Key Capabilities **Unified Detection and Segmentation**: RF-DETR supports both object detection and instance segmentation through a single, consistent API. Developers can switch between tasks without changing their pipeline architecture. **Fine-Tuning Ready**: The framework is designed from the ground up for fine-tuning on custom datasets. This is critical for real-world applications where pre-trained COCO weights serve as a starting point, not the final model. **TensorRT Optimization**: Built-in support for NVIDIA TensorRT acceleration enables deployment on edge devices and production servers with optimized inference speeds. **Multi-Scale Model Family**: With variants from Nano to 2XL, RF-DETR covers the full spectrum from resource-constrained edge devices to GPU-powered cloud deployments. **Roboflow Inference Integration**: Seamlessly integrates with the Roboflow Inference library for production serving, including model management, batching, and API hosting. ## Getting Started Installation is straightforward via pip: ```bash pip install rfdetr # For Plus models (XL, 2XL) pip install rfdetr[plus] ``` Running detection on an image: ```python from rfdetr import RFDETRBase model = RFDETRBase() detections = model.predict("image.jpg") ``` ## Limitations The Plus models (XL and 2XL) are released under the more restrictive PML 1.0 license rather than Apache 2.0, which may limit commercial usage for some organizations. The DINOv2 backbone requires more GPU memory during training compared to lightweight CNN backbones. Fine-tuning the larger variants requires significant compute resources. The segmentation models add roughly 2-4ms latency overhead compared to detection-only variants. Community ecosystem and third-party integrations are still growing compared to more established detection frameworks. ## Who Should Use This RF-DETR is ideal for computer vision engineers who need transformer-based detection accuracy with real-time inference speeds. Teams building custom detection pipelines benefit from the fine-tuning workflow and multi-scale model family. Organizations already using Roboflow's ecosystem get seamless integration. Researchers exploring DETR-based architectures can use RF-DETR as a strong production baseline. Edge AI developers looking for TensorRT-optimized models will find the Nano and Base variants particularly useful.