Overview
A developer pushed a code change to the dev branch of the Online Boutique application. Since the merge, one of the microservices is repeatedly crashing, and the frontend is returning errors when users try to browse products.
What you'll learn:
-
Identify pods stuck in
CrashLoopBackOff -
Use AI Assistants to investigate application crashes
-
Read pod logs to find error messages
-
Trace the problem back to a code-level issue
-
Understand what a fix looks like
Difficulty: Beginner Time required: 15-20 minutes
The Problem
What's happening: A developer pushes code to the dev branch → Flux syncs it to Kubernetes → The pod deploys → The pod crashes (CrashLoopBackOff) → The frontend shows errors.
What you know:
-
The
online-boutique-devenvironment was working yesterday -
A code change was merged to the
devbranch -
The application is now returning errors
What you need to find out:
-
Which service is crashing?
-
Why is it crashing?
-
What does the error tell us about the fix?
Step 1: Ask About the Dev Environment in Workspace Chat
-
Open the Sandbox workspace and go to Workspace Chat
-
Ask Eager Edgar about the dev environment — for example: "What's wrong in online-boutique-dev?"
-
Eager Edgar will check the environment and surface any issues it finds
Even a broad prompt is enough to get started. The AI Assistant will identify unhealthy pods, warning events, and deployment issues automatically.
Step 2: Ask Eager Edgar What's Wrong
In Workspace Chat, describe what you see. You don't need to know exact Kubernetes terminology — just describe the situation.
Sample Prompts
Try any of these:
Prompt Examples:
-
"What's wrong in online-boutique-dev?"
-
"There are unhealthy pods in the dev namespace"
-
"Check deployment health in online-boutique-dev"
-
"The online boutique dev environment has errors"
What Eager Edgar Does
The interaction flow:
-
You ask: "What's wrong in online-boutique-dev?"
-
Edgar suggests relevant diagnostic tasks
-
You say: "Run all"
-
Tasks execute against Kubernetes (check pod status, events, logs)
-
Edgar presents structured findings and analysis
Eager Edgar will suggest tasks such as:
-
Check Pod Status — Lists all pods and their current state
-
Check Deployment Health — Verifies replica availability
-
Inspect Pod Logs — Retrieves logs from crashing containers
-
Check Recent Events — Shows Kubernetes events with warnings
-
Get Pod Resource Details — Shows resource requests and limits
Step 3: Run the Diagnostic Tasks
Click "Run All" or select individual tasks to execute. Tasks typically complete in 5-30 seconds.
Expected Findings
After tasks complete, Eager Edgar will present findings like:
Finding 1: Pod in CrashLoopBackOff
🔴 Pod 'productcatalogservice-7f9c6b4d5-x2kp4' is in CrashLoopBackOff
Restart count: 14
Last state: Terminated with exit code 1
Container: server
Finding 2: Warning Events
⚠️ Back-off restarting failed container 'server'
in pod 'productcatalogservice-7f9c6b4d5-x2kp4'
Seen: 14 times in the last 30 minutes
Finding 3: Application Logs Show Error
🔴 Container logs show application error:
"Error: could not parse product catalog: invalid syntax in products.json"
or
"panic: runtime error: index out of range [5] with length 5"
Note: The specific error messages depend on the current state of the sandbox. The key insight is that the application code itself is crashing — not a Kubernetes infrastructure problem.
Step 4: Analyze the Root Cause
Analysis path: Multiple symptoms (Pod CrashLoopBackOff + Warning Events + Error in Logs) all point to the same analysis → Root Cause: Code error in recent commit.
What the Evidence Tells Us
|
Signal |
What It Means |
|---|---|
|
|
The container starts, crashes, Kubernetes restarts it, it crashes again — in a loop |
|
Exit code |
The application exited with an error (not killed by Kubernetes) |
|
Error in logs |
The application hit a code-level error — a parsing failure, nil reference, or similar bug |
|
Only one service affected |
The issue is isolated to a single service's code, not a cluster-wide problem |
Root Cause
The productcatalogservice (or another specific service) has a bug introduced in the most recent code change on the dev branch. The application crashes on startup because of a code error — for example, malformed data, a bad import, or an unhandled exception.
This is a code/application error, the most straightforward category of Kubernetes failures. The Kubernetes infrastructure is working correctly — it's faithfully trying to run broken code.
Step 5: Ask for Remediation Guidance
Now that you understand the root cause, ask Eager Edgar for help fixing it.
Sample Follow-Up Prompts
Prompt Examples:
-
"How do I fix the crashing productcatalogservice?"
-
"What should I do about this code error?"
-
"Can I roll back to a working version?"
-
"Show me what changed in the recent deployment"
Possible Remediation Paths
Eager Edgar may suggest several options:
Option 1: Roll Back the Deployment (quickest resolution)
kubectl rollout undo deployment/productcatalogservice -n online-boutique-dev
This reverts to the last known-good version while the code bug is fixed.
Option 2: Fix the Code
If you have access to the demo-sandbox-online-boutique repository, fix the bug on the dev branch. Flux will automatically redeploy.
Option 3: Scale Down the Broken Service
If the broken service is not blocking other work:
kubectl scale deployment/productcatalogservice --replicas=0 -n online-boutique-dev
Step 6: Verify the Fix
After applying a fix, confirm the environment is healthy.
Ask Eager Edgar to Verify
Prompt Examples:
-
"Is the productcatalogservice running now?"
-
"Check pod health in online-boutique-dev"
-
"Are there still any crashing pods?"
Success Criteria
✅ All pods in Running state
✅ No CrashLoopBackOff pods
✅ Deployment shows desired replica count available
✅ No new warning events
✅ Application frontend loads without errors
What You Learned
Key Takeaways
-
CrashLoopBackOff is the most common symptom of code errors. When a pod keeps crashing and restarting, start with the logs.
-
Logs tell the story. Application-level crashes almost always leave a clear error message in the container logs. RunWhen surfaces these automatically.
-
You didn't need
kubectl. The AI Assistant found the right diagnostic tasks, ran them, and presented structured findings — all from a natural language prompt. -
Code errors are isolated. When only one service is crashing and the error is in the application logs, you're dealing with a code bug, not an infrastructure problem.
Troubleshooting Pattern: Code Errors
The pattern: Pod in CrashLoopBackOff → Check Logs → Is it an error in app code? → If yes: Fix Code or Rollback. If no: Check Config / Resources.
Next Steps
Ready for a more nuanced investigation? In the next scenario, the code is fine — but the configuration is wrong.
→ Next: Scenario 2: Misconfigured Service (Test)