Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

LangExtract - Open Source | Evermx | Evermx

Back to Open Source

LangExtract

Apache-2.0

View on GitHub

Other35.9K Stars2.4K Forks97 views

LangExtract is an open-source Python library from Google that uses large language models to extract precisely structured information from unstructured text documents. Released in February 2026 and now at v1.2.1, the library has already accumulated 35,900 stars on GitHub, reflecting strong developer demand for reliable information extraction tooling. The library's standout feature is precise source grounding — every extracted piece of information is mapped back to its exact location in the source document, enabling interactive HTML visualizations that highlight where data came from. This makes LangExtract particularly valuable in high-stakes domains like healthcare, legal, and scientific research where verifiability is critical. To handle large documents efficiently, LangExtract employs an optimized strategy of text chunking and parallel processing with multiple passes for higher recall — solving the classic "needle in a haystack" challenge of extracting information from lengthy texts. The library enforces structured outputs using schema definitions and few-shot examples, leveraging Controlled Generation in supported models like Gemini to ensure consistent, schema-compliant results every time. LangExtract supports a wide range of LLM backends beyond Gemini, including OpenAI models and local models via Ollama, making it provider-agnostic. Real-world applications demonstrated include clinical information extraction from medical notes, radiology report structuring, medication extraction with dosage and relationship mapping, and full-text literary analysis. It is installable via PyPI with `pip install langextract`.

Key Features

Precise source grounding: maps every extraction to its exact location in the source text
Interactive HTML visualization for exploring and verifying extracted entities
Parallel chunking strategy for high-recall extraction from long documents
Provider-agnostic: supports Gemini, OpenAI, and local models via Ollama
Schema-enforced structured output using few-shot examples and Controlled Generation

Related Projects

TrendingOther

GitHub

206.5K18.4K

Superpowers

Jesse Vincent / Prime Radiant

MIT241

Open Source

LangExtract

Key Features

Tags

Related Projects

Superpowers

Langflow

Open WebUI

MarkItDown