Common User Journeys
This page describes step-by-step workflows for the most common scenarios you’ll encounter using the RunWhen platform. Each journey shows the steps a user follows to accomplish a goal.
Journey 1: Investigating an Unhealthy Service
Scenario: You receive a Slack alert or notice that a service is degraded. You need to find the root cause quickly.
Steps:
-
Open Workspace Chat in your workspace
-
Ask about the problem: “Why is checkoutservice unhealthy in production?”
-
Review the Assistant’s findings:
- It searches for existing Issues related to checkoutservice
- It runs diagnostic tasks (pod health, logs, events, recent changes)
- It presents a structured analysis with severity and evidence
-
Drill deeper: “What do the logs show?” or “What changed recently?”
-
Get a fix: “How do I fix this?” — the Assistant recommends specific remediation
-
Apply the fix: Follow the remediation steps (rollback, config change, resource adjustment)
-
Verify: “Is checkoutservice healthy now?” — confirm the fix worked
Time: 5-15 minutes depending on complexity
Relevant pages: Workspace Chat, Issues and Triage
Journey 2: Responding to a PagerDuty / Monitoring Alert
Scenario: An alert fires from your monitoring system. RunWhen can investigate before you even look at it.
With Workflows Configured (Automated)
- Alert fires in PagerDuty / Opsgenie / Prometheus
- Webhook triggers a RunWhen Workflow
- The Workflow starts a RunSession with an appropriate Assistant (e.g., Cautious Cathy)
- The Assistant runs relevant diagnostic tasks automatically
- Results are posted to Slack or the ticketing system
- When the on-call engineer opens the ticket, diagnostic context is already there
Without Workflows (Manual)
- Alert fires
- Engineer opens Workspace Chat
- Pastes the alert context: “PagerDuty alert: high error rate on frontend service in prod”
- The Assistant investigates and returns findings
- Engineer reviews and acts on recommendations
Relevant pages: Workspace Chat, Workspace Studio (Workflows)
Journey 3: Developer Self-Service Troubleshooting
Scenario: A developer’s deployment isn’t working in their dev/test namespace. They don’t know kubectl and don’t want to wait for the platform team.
Steps:
-
Developer logs into app.beta.runwhen.com
-
Opens their team’s workspace
-
Types: “My deployment isn’t starting in the dev namespace”
-
The Assistant:
- Checks pod status and events
- Identifies resource quota issues, image pull errors, or config problems
- Explains the issue in plain language
-
Developer follows the fix: “Change the memory request to 256Mi” or “Update the image tag”
-
No SRE escalation needed
Time: 5 minutes
Why this matters: Platform teams field these questions constantly. RunWhen gives developers direct access to the same diagnostic capability, freeing the platform team for higher-value work.
Journey 4: Onboarding to a New Workspace
Scenario: You’ve just been added to a RunWhen workspace and want to understand what’s being monitored and what issues exist.
Steps:
-
Open the workspace and review the Issues list (the default landing screen)
- Note the total issue count and which SLXs have findings
- Expand a few issues to see what’s being detected
-
Switch to Workspace Chat and ask a broad question:
- “What’s the overall health of this environment?”
- “Show me what’s wrong across all namespaces”
-
Explore Workspace Studio to understand the configuration:
- Tasks tab — what platforms and SLXs are configured
- Assistants tab — which assistants are available and their access levels
- Rules and Commands — any custom automation in place
-
Try a tutorial from the Live Demos page if the Sandbox workspace is available
-
Read the Learn section for background on how the platform works
Relevant pages: Issues and Triage, Workspace Studio, Engineering Assistants, Learn
Journey 5: Setting Up Automated Monitoring
Scenario: You want RunWhen to continuously monitor a namespace and alert you when issues are found.
Steps:
-
Open Workspace Studio > Tasks tab
-
Add SLX for the resource you want to monitor (e.g., a Kubernetes namespace)
-
Configure the SLX with:
- Health check tasks (SLIs) — define what “healthy” looks like
- Troubleshooting tasks (TaskSets) — what to investigate when health degrades
- Alerting thresholds (SLOs) — when to raise an alert
-
Set up a Workflow (Workflows tab) to notify Slack or PagerDuty when issues are detected
-
Configure an Assistant (e.g., Cautious Cathy) to automatically investigate new issues via webhook
-
Issues are now detected and investigated automatically, with results in Slack
Relevant pages: Workspace Studio, Engineering Assistants
Journey 6: Reviewing Past Incidents
Scenario: You want to review what happened during a previous incident for a post-mortem or to share with a colleague.
Steps:
-
Open Workspace Chat
-
Browse previous chat sessions (RunSessions) from the sidebar
-
Each session shows the full investigation timeline:
- Original question or alert trigger
- Tasks that were run
- Results and analysis
- Remediation steps taken
-
Share the session URL with colleagues or export for documentation
Relevant pages: Workspace Chat
Quick Reference: What to Ask
| Goal | Example Prompt |
|---|---|
| Check overall health | ”What’s unhealthy in [namespace]?” |
| Investigate specific service | ”Why is [service] crashing/slow/failing?” |
| Find recent changes | ”What changed in [namespace] recently?” |
| Get remediation steps | ”How do I fix this?” |
| Check resource issues | ”Are there resource quota problems in [namespace]?” |
| Compare environments | ”Compare the configuration between dev and test” |
| Investigate logs | ”What do the logs say for [service]?” |
| Broad sweep | ”Show me what’s wrong across all namespaces” |
| Specific issue | ”Tell me about the segmentation fault in checkoutservice” |
| Status check | ”Is [service] healthy now?” |