This page describes step-by-step workflows for the most common scenarios you'll encounter using the RunWhen platform. Each journey shows the steps a user follows to accomplish a goal.
Journey 1: Investigating an Unhealthy Service
Scenario: You receive a Slack alert or notice that a service is degraded. You need to find the root cause quickly.
Steps:
-
Open Workspace Chat in your workspace
-
Ask about the problem: "Why is checkoutservice unhealthy in production?"
-
Review the Assistant's findings:
-
It searches for existing Issues related to checkoutservice
-
It runs diagnostic tasks (pod health, logs, events, recent changes)
-
It presents a structured analysis with severity and evidence
-
-
Drill deeper: "What do the logs show?" or "What changed recently?"
-
Get a fix: "How do I fix this?" — the Assistant recommends specific remediation
-
Apply the fix: Follow the remediation steps (rollback, config change, resource adjustment)
-
Verify: "Is checkoutservice healthy now?" — confirm the fix worked
Time: 5-15 minutes depending on complexity
Relevant pages: Workspace Chat, Issues and Triage
Journey 2: Responding to a PagerDuty / Monitoring Alert
Scenario: An alert fires from your monitoring system. RunWhen can investigate before you even look at it.
With Workflows Configured (Automated)
-
Alert fires in PagerDuty / Opsgenie / Prometheus
-
Webhook triggers a RunWhen Workflow
-
The Workflow starts a RunSession with an appropriate Assistant (e.g., Cautious Cathy)
-
The Assistant runs relevant diagnostic tasks automatically
-
Results are posted to Slack or the ticketing system
-
When the on-call engineer opens the ticket, diagnostic context is already there
Without Workflows (Manual)
-
Alert fires
-
Engineer opens Workspace Chat
-
Pastes the alert context: "PagerDuty alert: high error rate on frontend service in prod"
-
The Assistant investigates and returns findings
-
Engineer reviews and acts on recommendations
Relevant pages: Workspace Chat, Workspace Studio (Workflows)
Journey 3: Developer Self-Service Troubleshooting
Scenario: A developer's deployment isn't working in their dev/test namespace. They don't know kubectl and don't want to wait for the platform team.
Steps:
-
Developer logs into app.beta.runwhen.com
-
Opens their team's workspace
-
Types: "My deployment isn't starting in the dev namespace"
-
The Assistant:
-
Checks pod status and events
-
Identifies resource quota issues, image pull errors, or config problems
-
Explains the issue in plain language
-
-
Developer follows the fix: "Change the memory request to 256Mi" or "Update the image tag"
-
No SRE escalation needed
Time: 5 minutes
Why this matters: Platform teams field these questions constantly. RunWhen gives developers direct access to the same diagnostic capability, freeing the platform team for higher-value work.
Journey 4: Onboarding to a New Workspace
Scenario: You've just been added to a RunWhen workspace and want to understand what's being monitored and what issues exist.
Steps:
-
Open the workspace and review the Issues list (the default landing screen)
-
Note the total issue count and which SLXs have findings
-
Expand a few issues to see what's being detected
-
-
Switch to Workspace Chat and ask a broad question:
-
"What's the overall health of this environment?"
-
"Show me what's wrong across all namespaces"
-
-
Explore Workspace Studio to understand the configuration:
-
Tasks tab — what platforms and SLXs are configured
-
Assistants tab — which assistants are available and their access levels
-
Rules and Commands — any custom automation in place
-
-
Try a tutorial from the Live Demos page if the Sandbox workspace is available
-
Read the Learn section for background on how the platform works
Relevant pages: Issues and Triage, Workspace Studio, Engineering Assistants, Learn
Journey 5: Setting Up Automated Monitoring
Scenario: You want RunWhen to continuously monitor a namespace and alert you when issues are found.
Steps:
-
Open Workspace Studio > Tasks tab
-
Add SLX for the resource you want to monitor (e.g., a Kubernetes namespace)
-
Configure the SLX with:
-
Health check tasks (SLIs) — define what "healthy" looks like
-
Troubleshooting tasks (TaskSets) — what to investigate when health degrades
-
Alerting thresholds (SLOs) — when to raise an alert
-
-
Set up a Workflow (Workflows tab) to notify Slack or PagerDuty when issues are detected
-
Configure an Assistant (e.g., Cautious Cathy) to automatically investigate new issues via webhook
-
Issues are now detected and investigated automatically, with results in Slack
Relevant pages: Workspace Studio, Engineering Assistants
Journey 6: Reviewing Past Incidents
Scenario: You want to review what happened during a previous incident for a post-mortem or to share with a colleague.
Steps:
-
Open Workspace Chat
-
Browse previous chat sessions (RunSessions) from the sidebar
-
Each session shows the full investigation timeline:
-
Original question or alert trigger
-
Tasks that were run
-
Results and analysis
-
Remediation steps taken
-
-
Share the session URL with colleagues or export for documentation
Relevant pages: Workspace Chat
Quick Reference: What to Ask
|
Goal |
Example Prompt |
|---|---|
|
Check overall health |
"What's unhealthy in [namespace]?" |
|
Investigate specific service |
"Why is [service] crashing/slow/failing?" |
|
Find recent changes |
"What changed in [namespace] recently?" |
|
Get remediation steps |
"How do I fix this?" |
|
Check resource issues |
"Are there resource quota problems in [namespace]?" |
|
Compare environments |
"Compare the configuration between dev and test" |
|
Investigate logs |
"What do the logs say for [service]?" |
|
Broad sweep |
"Show me what's wrong across all namespaces" |
|
Specific issue |
"Tell me about the segmentation fault in checkoutservice" |
|
Status check |
"Is [service] healthy now?" |