Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

Chandra - Open Source | Evermx | Evermx

Back to Open Source

Trending

Chandra

datalab-toApache-2.0

View on GitHub

Vision6.2K Stars679 Forks218 views

Chandra is an advanced optical character recognition model developed by Datalab that converts images and PDFs into structured HTML, Markdown, and JSON formats while preserving document layout. It excels at handling the document elements that traditionally cause the most errors: complex nested tables, mathematical equations, handwritten text, form fields with checkboxes, and mixed-language documents. With support for 90+ languages across Latin, CJK, Arabic, Devanagari, and Cyrillic scripts, Chandra achieves 86.7% overall accuracy on diverse document benchmarks, topping the external olmOCR benchmark. Features two inference modes: local via HuggingFace backend for privacy-sensitive documents and remote via vLLM server achieving approximately 1.44 pages/second on NVIDIA H100 hardware. Installation via pip install chandra-ocr with optional extras for HuggingFace backend and Streamlit web interface. The Chandra 2 release in March 2026 brought significant improvements in table recognition accuracy and processing speed. Code is Apache 2.0 licensed while model weights use a modified OpenRAIL-M license permitting research and personal use with commercial licensing available.

Key Features

86.7% overall accuracy topping the olmOCR benchmark on diverse document types
Complex nested table recognition with merged rows, columns, and multi-page tables
Mathematical equation recognition with LaTeX output for inline and display math
Handwriting recognition for both cursive and print handwriting
Support for 90+ languages across Latin, CJK, Arabic, Devanagari, and Cyrillic scripts
Multi-format output: Markdown, HTML, and JSON with bounding box coordinates
Dual inference modes: local HuggingFace and remote vLLM server at 1.44 pages/sec on H100
Automatic image extraction with generated captions and checkbox state recognition

Related Projects

TrendingVision

GitHub

108.4K12.6K

ComfyUI

Comfy-Org

GPL-3.0206

Open Source

Chandra

Key Features

Tags

Related Projects

ComfyUI

PaddleOCR

Ultralytics YOLO

Roboflow Supervision