Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
InferenceX (formerly InferenceMAX) is an open-source continuous inference benchmarking platform by SemiAnalysis that automatically evaluates AI inference frameworks every night across frontier GPU hardware. The project provides transparent, reproducible performance data for the most popular open-source inference stacks, with results published on a free public dashboard at inferencex.com. ## Why InferenceX Matters AI inference performance is a moving target. Frameworks like vLLM, SGLang, and TensorRT-LLM receive frequent updates that can dramatically change throughput and latency characteristics. Hardware vendors release new driver versions and optimization libraries on a regular cadence. Without continuous benchmarking, performance claims quickly become outdated. InferenceX solves this problem by running a comprehensive benchmark suite every night, capturing performance improvements as they happen. This gives infrastructure teams and cloud providers reliable, current data for hardware procurement decisions and deployment planning. ## Nightly Automated Benchmarking The core of InferenceX is its automated nightly benchmark cycle. Each night, the system evaluates multiple inference frameworks against a standardized set of workloads, recording metrics for throughput, latency, time-to-first-token, and other performance indicators. The results are automatically published to the public dashboard, creating a continuous performance timeline. This nightly cadence means that when a framework releases an optimization, the performance impact appears in the benchmark data within 24 hours. Teams can track exactly when performance improvements landed and quantify their impact across different hardware configurations. ## Multi-Framework Evaluation InferenceX benchmarks the three dominant open-source inference frameworks: SGLang, vLLM, and TensorRT-LLM. Each framework takes a different approach to inference optimization, and InferenceX provides apples-to-apples comparisons across standardized workloads. This cross-framework comparison helps teams make informed decisions about which stack best fits their specific requirements. ## Frontier GPU Hardware Coverage The benchmark fleet includes some of the most advanced accelerators available. Current hardware support covers NVIDIA GB200 NVL72, NVIDIA B200, NVIDIA H100, and AMD MI355X, with planned support for Google TPUv6e, TPUv7, and AWS Trainium2 and Trainium3. This multi-vendor coverage is critical for organizations evaluating hardware investments, providing direct performance comparisons across competing platforms. The project operates one of the largest open-source GPU CI/CD fleets, using close to 1,000 frontier GPUs for benchmarking. This scale enables testing of distributed inference strategies that only become relevant at multi-GPU and multi-node deployments. ## Model Coverage InferenceX currently benchmarks inference performance for Qwen 3.5, DeepSeek, and open-source GPT variants. The focus on the most popular production models ensures the benchmark results are directly relevant to real-world deployment scenarios. Support for large-scale MoE (Mixture of Experts) disaggregated inference with wide expert parallelism was added in the v2 release. ## Industry Validation The benchmark has been validated and supported by major cloud providers including Google Cloud, Microsoft Azure, and Oracle, as well as AI companies like OpenAI. This broad industry endorsement reflects the project's methodology rigor and the practical value of its results. ## Reproducible Methodology All benchmark configurations, scripts, and results are open-source under the Apache 2.0 license. Teams can reproduce any benchmark result on their own hardware, verify claims independently, and contribute new benchmark configurations. The transparent methodology builds trust in the results and enables community-driven improvements to the benchmarking process.