Overview
The Online Boutique application in the test environment appears to be running — pods are up, no crashes — but the checkout flow is broken. Users can browse products but get errors when trying to complete a purchase. A configuration issue is preventing services from communicating correctly.
What you'll learn:
-
Investigate issues where pods are running but the application isn't working correctly
-
Trace service-to-service communication failures
-
Inspect environment variables and ConfigMaps
-
Identify configuration mismatches between services
-
Understand the difference between "running" and "healthy"
Difficulty: Intermediate Time required: 15-20 minutes
The Problem
What's happening: User browses products (Frontend ✅) → Adds to cart (Cart Service ✅) → Attempts checkout (Checkout Service ✅) → Checkout tries to reach Payment Service (❌ wrong endpoint) → Checkout fails.
What you know:
-
The
online-boutique-testenvironment was promoted from a recent successful build -
All pods show
Runningstatus — nothing is crashing -
Users can browse products and add items to cart
-
Checkout fails with an error
What you need to find out:
-
Which service in the checkout chain is failing?
-
Why is it failing if the pod is running?
-
What configuration is wrong?
Step 1: Ask About the Test Environment in Workspace Chat
-
Open the Sandbox workspace and go to Workspace Chat
-
Ask Eager Edgar about the test environment — for example: "Check the health of online-boutique-test" or "Is anything wrong in the test namespace?"
-
Eager Edgar will check the environment and report what it finds
Warning: Unlike Scenario 1, the initial findings may look healthy at first glance. Pods are running, deployments are available. The issue is hidden inside the application's runtime behavior — which is why describing the user-facing symptom in the next step is important.
Step 2: Describe the Symptom to Eager Edgar
Since the infrastructure looks healthy, describe the user-facing symptom rather than a specific Kubernetes state.
Sample Prompts
Prompt Examples:
-
"The checkout flow is broken in online-boutique-test"
-
"Users are getting errors when they try to purchase items in the test environment"
-
"Something is wrong with the online-boutique-test namespace — the app isn't working correctly"
-
"Check service connectivity in online-boutique-test"
What Eager Edgar Does
The interaction flow:
-
You say: "Checkout is broken in online-boutique-test"
-
Edgar suggests diagnostics for service health
-
You say: "Run all"
-
Tasks check pods, logs, configs, and events
-
Edgar identifies the configuration issue
Eager Edgar will suggest tasks such as:
-
Check Pod Status — Confirms pods are running (they are)
-
Inspect Pod Logs — Looks for error messages in running containers
-
Check Service Endpoints — Verifies services have healthy backends
-
Inspect Environment Variables — Reviews configuration injected into pods
-
Check ConfigMaps and Secrets — Reviews configuration resources
-
Check Recent Events — Looks for warnings
Step 3: Run the Diagnostic Tasks
Click "Run All" to execute the suggested tasks. Since pods are running, this investigation requires looking deeper than pod status.
Expected Findings
Finding 1: Pods Are Running (but that's not the whole story)
✅ All 11 pods are in Running state
All deployments show desired replica count
No CrashLoopBackOff detected
Finding 2: Application Logs Show Connection Errors
🟡 checkoutservice logs:
"error: failed to charge order: could not connect to
paymentservice.online-boutique-wrong:50051:
dial tcp: lookup paymentservice.online-boutique-wrong: no such host"
Finding 3: Environment Variable Mismatch
⚠️ Environment variable mismatch detected:
Deployment: checkoutservice
Variable: PAYMENT_SERVICE_ADDR
Current value: "paymentservice.online-boutique-wrong:50051"
Expected value: "paymentservice:50051"
Finding 4: Recent ConfigMap Change
⚠️ ConfigMap 'service-config' was updated recently
Last modified: <timestamp>
Contains service endpoint overrides
Step 4: Analyze the Root Cause
Analysis path: Symptoms (Checkout fails, pods healthy) → Logs show connection error → Wrong service address in environment variable → ConfigMap has incorrect endpoint → Root Cause: Configuration error — wrong service endpoint in ConfigMap.
What the Evidence Tells Us
|
Signal |
What It Means |
|---|---|
|
Pods running |
The container image and code are fine — this isn't a code bug |
|
Connection error in logs |
A service is trying to reach another service at a wrong address |
|
Wrong environment variable |
The service address was overridden by a ConfigMap or deployment spec |
|
Recent ConfigMap change |
Someone (or an automated process) changed the configuration recently |
Root Cause
The checkoutservice is configured with an incorrect address for the paymentservice. The environment variable PAYMENT_SERVICE_ADDR points to a wrong namespace or hostname. This is a classic configuration issue — the code works fine, but it's been told to connect to the wrong place.
This commonly happens when:
-
Environment-specific configs are copy-pasted with leftover values
-
A ConfigMap was updated with an incorrect value
-
A promotion from one environment to another carried the wrong endpoint
Step 5: Ask for Remediation Guidance
Sample Follow-Up Prompts
Prompt Examples:
-
"How do I fix the payment service address in checkoutservice?"
-
"What should PAYMENT_SERVICE_ADDR be set to?"
-
"Fix the configuration in online-boutique-test"
-
"Show me the ConfigMap that needs to change"
The Fix
Eager Edgar will guide you to correct the configuration. The fix involves updating the environment variable to point to the correct service:
Corrected Configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: checkoutservice
namespace: online-boutique-test
spec:
template:
spec:
containers:
- name: server
env:
- name: PAYMENT_SERVICE_ADDR
value: "paymentservice:50051" # Fixed: removed wrong namespace
Or if using a ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: service-config
namespace: online-boutique-test
data:
PAYMENT_SERVICE_ADDR: "paymentservice:50051"
How to apply:
kubectl apply -f <fixed-config>.yaml -n online-boutique-test
After applying, the checkoutservice pod will pick up the new configuration (you may need to restart the pod for environment variable changes to take effect):
kubectl rollout restart deployment/checkoutservice -n online-boutique-test
Step 6: Verify the Fix
Ask Eager Edgar to Verify
Prompt Examples:
-
"Is the checkout flow working now in online-boutique-test?"
-
"Check if checkoutservice can reach paymentservice"
-
"Are there still connection errors in the logs?"
Success Criteria
✅ Checkout service logs show successful payment calls
✅ No "connection refused" or "no such host" errors in logs
✅ All pods still running
✅ End-to-end checkout flow completes successfully
What You Learned
Key Takeaways
-
Running doesn't mean healthy. Pods in
Runningstate can still have functional failures. Always check application logs, not just pod status. -
Configuration issues are sneaky. They don't cause crashes — they cause incorrect behavior. Look for connection errors, wrong endpoints, and missing values in logs.
-
Environment variables and ConfigMaps are common culprits. When services can't communicate, check how they discover each other — environment variables, ConfigMaps, and DNS entries.
-
AI Assistants look beyond status. When you describe a functional symptom ("checkout is broken"), Eager Edgar investigates logs, configs, and events — not just pod status.
Troubleshooting Pattern: Configuration Issues
The pattern: App errors but pods running → Check logs for connection/config errors → Wrong address or value? → If yes: Fix ConfigMap / Env Var. If no: Check Secrets / Network Policies.
Comparing Scenarios 1 and 2
|
|
Scenario 1: Code Error |
Scenario 2: Config Error |
|---|---|---|
|
Pod status |
CrashLoopBackOff |
Running |
|
Where the error shows |
Container exit code + startup logs |
Runtime logs during specific operations |
|
Root cause |
Bug in code |
Wrong value in configuration |
|
Fix location |
Application source code |
Kubernetes config (ConfigMap, env var) |
|
Detection difficulty |
Easy (pod is clearly broken) |
Moderate (pod looks healthy) |
Next Steps
Ready for the most challenging scenario? In Production, the application and configuration are both correct — but an infrastructure dependency is down.
→ Next: Scenario 3: Database Connection Failure (Prod)