RunWhen is an AI SRE platform that automates troubleshooting and remediation for Kubernetes and cloud environments. It works in two modes:
-
Autonomous mode — RunWhen continuously runs diagnostic tasks in the background, building production insights: structured findings about the health, configuration, and behavior of your services. These insights are always current and ready for the AI to reference — no waiting for data collection when something goes wrong.
-
Interactive mode — When you ask a question in Workspace Chat (or an alert fires), an AI assistant combines those background insights with new, targeted diagnostics to give you an actionable answer — root cause, remediation steps, or what to investigate next.
All tasks execute inside your own clusters via a lightweight agent. Your credentials and data stay in your network; the platform coordinates what to run.
Get Started
|
🏁 Try It Now Jump into the Live Demos — no setup required. See how the platform diagnoses real application failures in a shared sandbox. |
🚀 Set Up Your Environment Ready to connect your own clusters? The Install section walks you through deployment in about 30 minutes. |
📚 Learn How It Works Want to understand the architecture first? Learn covers the key concepts — workspaces, tasks, assistants, and more. |
Day-to-Day Usage
Once your environment is connected, the Use section covers common workflows: asking questions in Workspace Chat, reviewing issues, running tasks, setting up rules and commands to customize your assistants, and sharing findings with your team.
Documentation Map
|
Hands-on tutorials using a shared sandbox workspace. Walk through real failure scenarios and see AI-assisted diagnosis in action. |
|
|
Core concepts: workspaces, assistants, tasks, issues, and how the system discovers and monitors your infrastructure. |
|
|
Deployment guides for SaaS with a private runner, or self-hosted. Includes prerequisites, helm chart configuration, and verification steps. |
|
|
Post-install setup: cloud discovery, SSO, user management, Slack integration, webhooks, secrets, and custom task configuration. |
|
|
Workspace Chat, Workspace Studio (tasks, rules, commands, knowledge), issue management, and everyday platform workflows. |
|
|
System design, data flow, deployment topology, and scalability considerations. |
How It Works (30-Second Version)
-
You ask a question (“Why is checkout failing in production?”) or an alert fires.
-
An AI assistant picks relevant tasks from a library of expert-authored automation.
-
Tasks run in your environment via a lightweight agent (the “runner”) that stays inside your cluster.
-
Results are analyzed and returned with root-cause findings and recommended next steps.
Your credentials and cluster data never leave your network. The platform coordinates what to run; the runner executes it locally.
Quick Links
|
Platform & Tools
|
Support |
Additional resources: For task authoring and CodeBundle development, see the RunWhen Authors Documentation