Common User Journeys | Product Documentation

This page describes step-by-step workflows for the most common scenarios you'll encounter using the RunWhen platform. Each journey shows the steps a user follows to accomplish a goal.

Journey 1: Investigating an Unhealthy Service

Scenario: You receive a Slack alert or notice that a service is degraded. You need to find the root cause quickly.

Steps:

Open Workspace Chat in your workspace
Ask about the problem: "Why is checkoutservice unhealthy in production?"
Review the Assistant's findings:
- It searches for existing Issues related to checkoutservice
- It runs diagnostic tasks (pod health, logs, events, recent changes)
- It presents a structured analysis with severity and evidence
Drill deeper: "What do the logs show?" or "What changed recently?"
Get a fix: "How do I fix this?" — the Assistant recommends specific remediation
Apply the fix: Follow the remediation steps (rollback, config change, resource adjustment)
Verify: "Is checkoutservice healthy now?" — confirm the fix worked

Time: 5-15 minutes depending on complexity

Relevant pages: Workspace Chat, Issues and Triage

Journey 2: Responding to a PagerDuty / Monitoring Alert

Scenario: An alert fires from your monitoring system. RunWhen can investigate before you even look at it.

With Workflows Configured (Automated)

Alert fires in PagerDuty / Opsgenie / Prometheus
Webhook triggers a RunWhen Workflow
The Workflow starts a RunSession with an appropriate Assistant (e.g., Cautious Cathy)
The Assistant runs relevant diagnostic tasks automatically
Results are posted to Slack or the ticketing system
When the on-call engineer opens the ticket, diagnostic context is already there

Without Workflows (Manual)

Alert fires
Engineer opens Workspace Chat
Pastes the alert context: "PagerDuty alert: high error rate on frontend service in prod"
The Assistant investigates and returns findings
Engineer reviews and acts on recommendations

Relevant pages: Workspace Chat, Workspace Studio (Workflows)

Journey 3: Developer Self-Service Troubleshooting

Scenario: A developer's deployment isn't working in their dev/test namespace. They don't know kubectl and don't want to wait for the platform team.

Steps:

Developer logs into app.beta.runwhen.com
Opens their team's workspace
Types: "My deployment isn't starting in the dev namespace"
The Assistant:
- Checks pod status and events
- Identifies resource quota issues, image pull errors, or config problems
- Explains the issue in plain language
Developer follows the fix: "Change the memory request to 256Mi" or "Update the image tag"
No SRE escalation needed

Time: 5 minutes

Why this matters: Platform teams field these questions constantly. RunWhen gives developers direct access to the same diagnostic capability, freeing the platform team for higher-value work.

Journey 4: Onboarding to a New Workspace

Scenario: You've just been added to a RunWhen workspace and want to understand what's being monitored and what issues exist.

Steps:

Open the workspace and review the Issues list (the default landing screen)
- Note the total issue count and which SLXs have findings
- Expand a few issues to see what's being detected
Switch to Workspace Chat and ask a broad question:
- "What's the overall health of this environment?"
- "Show me what's wrong across all namespaces"
Explore Workspace Studio to understand the configuration:
- Tasks tab — what platforms and SLXs are configured
- Assistants tab — which assistants are available and their access levels
- Rules and Commands — any custom automation in place
Try a tutorial from the Live Demos page if the Sandbox workspace is available
Read the Learn section for background on how the platform works

Relevant pages: Issues and Triage, Workspace Studio, Engineering Assistants, Learn

Journey 5: Setting Up Automated Monitoring

Scenario: You want RunWhen to continuously monitor a namespace and alert you when issues are found.

Steps:

Open Workspace Studio > Tasks tab
Add SLX for the resource you want to monitor (e.g., a Kubernetes namespace)
Configure the SLX with:
- Health check tasks (SLIs) — define what "healthy" looks like
- Troubleshooting tasks (TaskSets) — what to investigate when health degrades
- Alerting thresholds (SLOs) — when to raise an alert
Set up a Workflow (Workflows tab) to notify Slack or PagerDuty when issues are detected
Configure an Assistant (e.g., Cautious Cathy) to automatically investigate new issues via webhook
Issues are now detected and investigated automatically, with results in Slack

Relevant pages: Workspace Studio, Engineering Assistants

Journey 6: Reviewing Past Incidents

Scenario: You want to review what happened during a previous incident for a post-mortem or to share with a colleague.

Steps:

Open Workspace Chat
Browse previous chat sessions (RunSessions) from the sidebar
Each session shows the full investigation timeline:
- Original question or alert trigger
- Tasks that were run
- Results and analysis
- Remediation steps taken
Share the session URL with colleagues or export for documentation

Relevant pages: Workspace Chat

Quick Reference: What to Ask

Goal	Example Prompt
Check overall health	"What's unhealthy in [namespace]?"
Investigate specific service	"Why is [service] crashing/slow/failing?"
Find recent changes	"What changed in [namespace] recently?"
Get remediation steps	"How do I fix this?"
Check resource issues	"Are there resource quota problems in [namespace]?"
Compare environments	"Compare the configuration between dev and test"
Investigate logs	"What do the logs say for [service]?"
Broad sweep	"Show me what's wrong across all namespaces"
Specific issue	"Tell me about the segmentation fault in checkoutservice"
Status check	"Is [service] healthy now?"