Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Promptfoo is an open-source CLI and library for testing, evaluating, and red-teaming LLM applications. With over 12,000 GitHub stars and MIT licensing, it has established itself as the go-to tool for developers who need systematic prompt evaluation, model comparison, and security vulnerability scanning across AI applications. ## Why LLM Testing Matters As LLM-powered applications move from prototypes to production, the gap between "works in demo" and "works reliably" becomes a critical business risk. Prompt regressions, model version changes, and security vulnerabilities can silently degrade application quality. Traditional software testing approaches do not map cleanly onto probabilistic LLM outputs, leaving teams relying on manual spot-checking and hope. Promptfoo addresses this gap by providing a structured, repeatable framework for evaluating LLM behavior. It treats prompt engineering as a measurable discipline rather than an art form, bringing software engineering rigor to a domain that desperately needs it. ## Core Architecture ### Declarative Test Configuration Promptfoo uses YAML configuration files to define test cases, making evaluations version-controllable and reproducible. Developers specify prompts, expected outputs, and assertion criteria in a declarative format that integrates naturally with existing development workflows. No heavy notebooks or custom scripting required. ### Multi-Provider Support The framework supports over 60 LLM providers out of the box, including OpenAI, Anthropic, Google, Azure, AWS Bedrock, Ollama, and many more. This breadth enables side-by-side model comparison across providers, helping teams make informed decisions about which model best fits their specific use case and budget. ### Assertion System Promptfoo offers a rich assertion system for validating LLM outputs. Developers can check for exact matches, JSON structure compliance, semantic similarity, custom function evaluations, and more. This granularity enables precise quality gates that catch regressions before they reach users. ## Key Capabilities ### Red Teaming and Security Scanning One of Promptfoo's standout features is its automated red teaming capability. The framework covers over 50 vulnerability types, from prompt injection and jailbreaks to data exfiltration and harmful content generation. Security scans can be run as part of CI/CD pipelines, providing continuous protection against emerging attack vectors. The security scanning aligns with industry frameworks including OWASP Top 10 for LLMs and NIST AI Risk Management, generating compliance-ready reports that satisfy enterprise security requirements. ### CI/CD Integration Promptfoo integrates directly into continuous integration pipelines. Developers can set minimum performance thresholds and automatically fail builds when prompt quality drops below acceptable levels. This transforms prompt evaluation from a periodic manual activity into an automated quality gate. ### Model Comparison The framework's comparison mode enables side-by-side evaluation of different models, prompt variations, and configuration parameters. Results are displayed in both command-line and web-based interfaces, with visual diffs that highlight performance differences across test cases. ### Code Scanning Beyond runtime evaluation, Promptfoo includes static analysis capabilities that scan pull requests for LLM-related security issues. This catches potential vulnerabilities before code is merged, adding another layer of defense to the development process. ## Developer Experience Promptfoo prioritizes developer productivity with features that reduce friction in the evaluation workflow. Live reload automatically reruns evaluations when configuration files change, enabling rapid iteration. Intelligent caching avoids redundant API calls, reducing both cost and evaluation time. Results persist locally, allowing historical comparison without external infrastructure. Installation is straightforward across multiple package managers. Developers can get started with a single command via npx, npm, pip, or Homebrew, and the init wizard generates starter configurations for common use cases. ## Production Readiness With over 7,600 commits, 397 releases, and a battle-tested track record serving millions of users, Promptfoo has proven its reliability at scale. The project maintains an active release cadence with version 0.121.1 shipping on March 9, 2026, demonstrating sustained development momentum. The MIT license ensures unrestricted commercial use, and the project's architecture keeps all evaluations local by default, meaning prompts and test data never leave the developer's machine unless explicitly configured to do so. ## Practical Applications Promptfoo serves multiple roles in the LLM application lifecycle. During development, it accelerates prompt iteration by providing instant feedback on output quality. During code review, it catches prompt regressions and security issues in pull requests. In production, it monitors model performance across provider updates and version changes. For compliance, it generates reports documenting security posture against industry frameworks. ## Limitations Promptfoo's declarative approach requires upfront investment in defining test cases and assertion criteria, which can feel heavyweight for simple prototyping. The red teaming features, while comprehensive, cannot guarantee complete security coverage against novel attack vectors. Some assertion types require careful calibration to avoid false positives, particularly semantic similarity checks on creative or open-ended outputs. ## Who Should Use Promptfoo Promptfoo is essential for teams building production LLM applications that need quality assurance, security scanning, and compliance documentation. It is particularly valuable for enterprises with regulatory requirements, development teams practicing CI/CD for AI applications, and security engineers responsible for LLM application hardening.