Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

TurboVec - Open Source | Evermx | Evermx

Back to Open Source

Trending

TurboVec

RyanCodraiMIT

View on GitHub

Inference7.3K Stars708 Forks59 views

TurboVec is a Rust-with-Python-bindings vector search index built on Google Research's TurboQuant algorithm that has crossed 7,300 GitHub stars by attacking the single most expensive line item in modern retrieval stacks: the RAM bill. Where a 10 million document corpus of OpenAI 1536-dimensional embeddings normally costs 31 GB of float32 memory, TurboVec serves the same corpus from a 4 GB footprint at 16x compression — and beats FAISS IndexPQ by 0.4 to 3.4 points of R@1 recall while doing it. For teams running production RAG, this is the difference between a single VM and a cluster. ## What TurboVec Is For The project targets one specific deployment shape: in-process, fully local vector search where every byte of RAM and every microsecond of latency is being paid for by an engineer who can read the numbers. The Rust core does the math, the Python bindings make it usable from any modern LLM stack, and the explicit goal is to drop into LangChain, LlamaIndex, Haystack, or Agno as a replacement vector store with no service to deploy and no external API to call. Online ingestion means there is no training phase, no parameter tuning ritual, and no separate index-building job — vectors go in, queries come out. ## TurboQuant: The Quantization Idea The underlying TurboQuant algorithm from Google Research is what makes the compression and recall numbers possible at the same time. The pipeline is short and surprisingly principled: vectors are normalized and randomly rotated to make per-coordinate distributions predictable, each coordinate is calibrated on the first ingest pass, Lloyd-Max scalar quantization assigns precomputed optimal buckets, the result is bit-packed for the 16x reduction, and a length-renormalized scoring step corrects for the quantization bias at query time. Each step is doing real work — the random rotation in particular is what stops the predictable per-coordinate calibration from depending on training data — and the cumulative effect is that 6,144-byte 1536-dim vectors compress to 384 bytes without the recall collapse that scalar quantization normally produces. ## SIMD Kernels, Filtered Search, and Speed The Rust side is not a thin wrapper. TurboVec ships hand-written NEON kernels for ARM and AVX-512BW kernels for modern x86, and the published benchmarks put it 12 to 20 percent faster than FAISS on ARM hardware and 1 to 6 percent faster at 4-bit on x86. Filtered search — passing an id allowlist at query time — is supported with no recall penalty, which closes the most common gap between research-grade ANN indexes and the messy real world of multi-tenant retrieval where every query has user, workspace, or permission filters attached. The benchmarks are reported across OpenAI 1536 and 3072-dim embeddings, which are the configurations production teams actually ship today. ## Integration with the LLM Stack The repository ships drop-in replacements for LangChain, LlamaIndex, Haystack, and Agno vector stores, so adoption looks like swapping a class import rather than rewriting an ingestion pipeline. The Python API is around 55 percent of the codebase and the Rust core around 45 percent, which is the right split for a library that wants to be easy to install and hard to outgrow. Because the entire system runs in-process with no service to host, there is no separate deployment, no network hop, and no API key to manage — properties that matter both for cost and for compliance-sensitive deployments. ## Limitations TurboVec is a vector index, not a full vector database — there is no built-in replication, no clustering, no admin UI, and no managed offering. Teams that need cross-node sharding, hot-standby replicas, or a hosted control plane should still reach for Qdrant, Weaviate, or a managed service and treat TurboVec as the in-process accelerator inside those systems. The quantization regime is tuned for embedding-style vectors where length-renormalized scoring is well-behaved; non-embedding workloads (raw feature vectors with heavy-tailed distributions, for example) may not see the same recall numbers. Finally, while the MIT license is permissive, the TurboQuant algorithm itself comes from Google Research and the longer-term provenance of any patent considerations is worth a careful read before betting a commercial product on it. Within those bounds, in mid-2026 TurboVec is the strongest answer to "how do I cut my RAG memory bill by 8x without losing recall" — a question every RAG team is now being asked by their finance team.