Learn
This section explains how RunWhen works, what problems it solves, and links to the detailed documentation and hands-on tutorials that will help you get productive.
What is RunWhen?
RunWhen is an AI SRE platform that automates troubleshooting and remediation for Kubernetes and cloud environments. It connects AI-powered Engineering Assistants to your infrastructure through a library of automated diagnostic and remediation tasks — so your team can investigate incidents in natural language instead of memorizing kubectl commands and runbooks.
What it replaces:
- Manual runbooks and tribal knowledge
- Waiting for the one person who knows the system
- Copy-pasting kubectl commands from Slack threads
- Starting incident triage from scratch every time
What it provides:
- Reduce MTTR — AI-powered root cause analysis backed by automated diagnostic tasks
- Scale expertise — Encode troubleshooting knowledge into reusable tasks available to all teams
- Automate triage — Run diagnostics immediately when alerts fire, before humans get involved
- Developer self-service — Give developers the ability to investigate issues in their own environments without waiting for SRE escalation
How RunWhen Works
RunWhen is different from traditional observability tools. Instead of just collecting metrics and logs, RunWhen runs automated tasks that actively investigate your environment and produce structured insights designed for AI consumption.
Background Mode: Building Production Insights
RunWhen continuously runs tasks in the background to understand your environment:
- Health Monitoring — Check pod status, deployment health, service availability
- Discovery — Identify resources, map dependencies, catalog configurations
- Baseline Tasks — Establish normal behavior, detect anomalies, track trends
The results are structured production insights — not raw metrics. This means less token consumption and more accurate AI responses, because the data is already structured for LLM consumption.
Interactive Mode: Workspace Chat
When you ask a question or an alert fires, the Engineering Assistant combines background insights with new targeted diagnostics:
- You ask a question (or an alert fires)
- Assistant checks existing production insights from background tasks
- Assistant runs new diagnostic tasks targeted at your question
- AI analyzes everything — background insights + new diagnostics
- You get an actionable answer — root cause, remediation steps, or what to investigate next
The Engineering Assistant may surface existing Issues that have already been identified, or it may suggest and run new Tasks to gather more information. Both paths are part of the normal workflow.
Key Concepts
The pages in this section cover these foundational concepts in detail:
| Concept | Description |
|---|---|
| Workspaces | The tenancy boundary — your dedicated environment containing users, SLXs, AI Assistants, secrets, and configurations |
| SLXs (Service-Level eXperiences) | The core unit of operational knowledge — defines what to monitor, how to measure it, and what to do when something goes wrong |
| Engineering Assistants | AI-powered agents that suggest and run tasks, analyze results, and recommend next steps — always operating within workspace RBAC |
| Tasks (CodeBundles) | Automated scripts (Python, Bash, SQL, REST) that run in your environment to collect insights or perform actions |
| Issues | Findings raised by tasks that signal something is wrong and requires investigation |
| RunSessions | Interactive troubleshooting sessions where you collaborate with an Assistant to investigate problems |
| Rules | Behavioral guidelines that shape how assistants interpret and respond to specific situations — de-noise your environment, prioritize what matters |
| Commands | Named multi-step procedures an assistant can execute on request — bundle a sequence of tasks into a single action |
| Secrets | Credentials managed at the workspace level, stored in HashiCorp Vault, never readable via the API |
For a complete glossary, see Terms and Concepts.
Common Use Cases
Developer Self-Service
Developers ask platform teams frequent questions that block their work. RunWhen gives developers direct access to AI-powered troubleshooting in their own environments, reducing escalations to the platform team.
Scaling Operations Across Environments
As infrastructure grows, staffing one SRE per environment doesn’t scale. Engineering Assistants can monitor and triage across multiple clusters and environments from a single workspace.
Faster Incident Response
Automated diagnostics run immediately when an alert fires — before a human is paged. When the on-call engineer responds, they have structured diagnostic output instead of starting from scratch.
Capturing Tribal Knowledge
Expert SRE knowledge often lives in people’s heads. RunWhen’s task library lets you encode that knowledge into reusable automation that survives team turnover.
Tutorials
Hands-on tutorials to learn RunWhen through real scenarios in the Sandbox environment:
| Tutorial | What You’ll Learn | Difficulty | Time |
|---|---|---|---|
| Crashing Code Deploy (Dev) | Investigate a pod crash caused by a code bug | Beginner | 15-20 min |
| Misconfigured Service (Test) | Diagnose a service-to-service communication failure | Intermediate | 15-20 min |
| Database Connection Failure (Prod) | Trace intermittent crashes to connection pool exhaustion | Intermediate | 15-20 min |
Start with the Dev scenario if you’re new — it introduces the core Workspace Chat troubleshooting workflow.
See the Live Demos page for the Sandbox workspace and ready-to-use prompts.
Getting Started
- Try the Sandbox — Sign up at app.beta.runwhen.com and explore the pre-configured Sandbox workspace
- Follow the Quick Start — Deploy RunWhen Local to your own cluster in about 15 minutes
- Read the Architecture — Understand the three-component system (Platform, Local Agent, CodeCollections)
Child Pages
The pages below this section cover each concept in detail:
- Workspaces — Tenancy, membership, and workspace-scoped configuration
- SLXs — Creating and managing Service-Level eXperiences
- AI Assistants — How assistants work within RBAC boundaries
- CodeBundles — The task libraries that power automation
- Secrets — Credential management and how RunWhen handles sensitive data