Skip to content

Learn

This section explains how RunWhen works, what problems it solves, and links to the detailed documentation and hands-on tutorials that will help you get productive.

What is RunWhen?

RunWhen is an AI SRE platform that automates troubleshooting and remediation for Kubernetes and cloud environments. It connects AI-powered Engineering Assistants to your infrastructure through a library of automated diagnostic and remediation tasks — so your team can investigate incidents in natural language instead of memorizing kubectl commands and runbooks.

What it replaces:

  • Manual runbooks and tribal knowledge
  • Waiting for the one person who knows the system
  • Copy-pasting kubectl commands from Slack threads
  • Starting incident triage from scratch every time

What it provides:

  • Reduce MTTR — AI-powered root cause analysis backed by automated diagnostic tasks
  • Scale expertise — Encode troubleshooting knowledge into reusable tasks available to all teams
  • Automate triage — Run diagnostics immediately when alerts fire, before humans get involved
  • Developer self-service — Give developers the ability to investigate issues in their own environments without waiting for SRE escalation

How RunWhen Works

RunWhen is different from traditional observability tools. Instead of just collecting metrics and logs, RunWhen runs automated tasks that actively investigate your environment and produce structured insights designed for AI consumption.

Background Mode: Building Production Insights

RunWhen continuously runs tasks in the background to understand your environment:

  • Health Monitoring — Check pod status, deployment health, service availability
  • Discovery — Identify resources, map dependencies, catalog configurations
  • Baseline Tasks — Establish normal behavior, detect anomalies, track trends

The results are structured production insights — not raw metrics. This means less token consumption and more accurate AI responses, because the data is already structured for LLM consumption.

Interactive Mode: Workspace Chat

When you ask a question or an alert fires, the Engineering Assistant combines background insights with new targeted diagnostics:

  1. You ask a question (or an alert fires)
  2. Assistant checks existing production insights from background tasks
  3. Assistant runs new diagnostic tasks targeted at your question
  4. AI analyzes everything — background insights + new diagnostics
  5. You get an actionable answer — root cause, remediation steps, or what to investigate next

The Engineering Assistant may surface existing Issues that have already been identified, or it may suggest and run new Tasks to gather more information. Both paths are part of the normal workflow.

Key Concepts

The pages in this section cover these foundational concepts in detail:

ConceptDescription
WorkspacesThe tenancy boundary — your dedicated environment containing users, SLXs, AI Assistants, secrets, and configurations
SLXs (Service-Level eXperiences)The core unit of operational knowledge — defines what to monitor, how to measure it, and what to do when something goes wrong
Engineering AssistantsAI-powered agents that suggest and run tasks, analyze results, and recommend next steps — always operating within workspace RBAC
Tasks (CodeBundles)Automated scripts (Python, Bash, SQL, REST) that run in your environment to collect insights or perform actions
IssuesFindings raised by tasks that signal something is wrong and requires investigation
RunSessionsInteractive troubleshooting sessions where you collaborate with an Assistant to investigate problems
RulesBehavioral guidelines that shape how assistants interpret and respond to specific situations — de-noise your environment, prioritize what matters
CommandsNamed multi-step procedures an assistant can execute on request — bundle a sequence of tasks into a single action
SecretsCredentials managed at the workspace level, stored in HashiCorp Vault, never readable via the API

For a complete glossary, see Terms and Concepts.

Common Use Cases

Developer Self-Service

Developers ask platform teams frequent questions that block their work. RunWhen gives developers direct access to AI-powered troubleshooting in their own environments, reducing escalations to the platform team.

Scaling Operations Across Environments

As infrastructure grows, staffing one SRE per environment doesn’t scale. Engineering Assistants can monitor and triage across multiple clusters and environments from a single workspace.

Faster Incident Response

Automated diagnostics run immediately when an alert fires — before a human is paged. When the on-call engineer responds, they have structured diagnostic output instead of starting from scratch.

Capturing Tribal Knowledge

Expert SRE knowledge often lives in people’s heads. RunWhen’s task library lets you encode that knowledge into reusable automation that survives team turnover.

Tutorials

Hands-on tutorials to learn RunWhen through real scenarios in the Sandbox environment:

TutorialWhat You’ll LearnDifficultyTime
Crashing Code Deploy (Dev)Investigate a pod crash caused by a code bugBeginner15-20 min
Misconfigured Service (Test)Diagnose a service-to-service communication failureIntermediate15-20 min
Database Connection Failure (Prod)Trace intermittent crashes to connection pool exhaustionIntermediate15-20 min

Start with the Dev scenario if you’re new — it introduces the core Workspace Chat troubleshooting workflow.

See the Live Demos page for the Sandbox workspace and ready-to-use prompts.

Getting Started

  1. Try the Sandbox — Sign up at app.beta.runwhen.com and explore the pre-configured Sandbox workspace
  2. Follow the Quick Start — Deploy RunWhen Local to your own cluster in about 15 minutes
  3. Read the Architecture — Understand the three-component system (Platform, Local Agent, CodeCollections)

Child Pages

The pages below this section cover each concept in detail:

  • Workspaces — Tenancy, membership, and workspace-scoped configuration
  • SLXs — Creating and managing Service-Level eXperiences
  • AI Assistants — How assistants work within RBAC boundaries
  • CodeBundles — The task libraries that power automation
  • Secrets — Credential management and how RunWhen handles sensitive data