Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
OpenSRE is an open-source framework from Tracer Cloud for building AI Site Reliability Engineering (SRE) agents that autonomously investigate production incidents. Released in public alpha in 2026, it has accumulated 3,600 GitHub stars from platform engineering teams seeking credible open-source alternatives to expensive proprietary AIOps platforms. The framework uses a Planner-sub-agent architecture: a high-level reasoning agent formulates investigation hypotheses, then dispatches specialized sub-agents in parallel to query Kubernetes cluster state, pull time-series metrics from Grafana or Datadog, search logs for error patterns, and analyze distributed traces for latency cascades. Findings are correlated and a structured, evidence-backed root cause analysis report is generated — automating a workflow that traditionally requires skilled human SREs working under incident pressure. OpenSRE connects to 60+ platforms across the modern cloud stack: observability tools (Grafana, Datadog, Honeycomb), infrastructure (Kubernetes, AWS, GCP, Azure), databases (PostgreSQL, MySQL, Redis), and incident management (PagerDuty, Opsgenie, Slack). The framework includes synthetic incident simulation infrastructure — enabling agent evaluation and workflow refinement in controlled environments before production deployment. Self-hosted under Apache 2.0 licensing, production incident data never leaves the organization's infrastructure. Community forks have extended the core with episodic memory and Neo4j knowledge graph support for accumulating institutional knowledge across past investigations.