Scenario 2: Misconfigured Service (Test)

Overview

The Online Boutique application in the test environment appears to be running — pods are up, no crashes — but the checkout flow is broken. Users can browse products but get errors when trying to complete a purchase. A configuration issue is preventing services from communicating correctly.

What you'll learn:

Investigate issues where pods are running but the application isn't working correctly
Trace service-to-service communication failures
Inspect environment variables and ConfigMaps
Identify configuration mismatches between services
Understand the difference between "running" and "healthy"

Difficulty: Intermediate Time required: 15-20 minutes

The Problem

What's happening: User browses products (Frontend ✅) → Adds to cart (Cart Service ✅) → Attempts checkout (Checkout Service ✅) → Checkout tries to reach Payment Service (❌ wrong endpoint) → Checkout fails.

What you know:

The online-boutique-test environment was promoted from a recent successful build
All pods show Running status — nothing is crashing
Users can browse products and add items to cart
Checkout fails with an error

What you need to find out:

Which service in the checkout chain is failing?
Why is it failing if the pod is running?
What configuration is wrong?

Step 1: Ask About the Test Environment in Workspace Chat

Open the Sandbox workspace and go to Workspace Chat
Ask Eager Edgar about the test environment — for example: "Check the health of online-boutique-test" or "Is anything wrong in the test namespace?"
Eager Edgar will check the environment and report what it finds

Warning: Unlike Scenario 1, the initial findings may look healthy at first glance. Pods are running, deployments are available. The issue is hidden inside the application's runtime behavior — which is why describing the user-facing symptom in the next step is important.

Step 2: Describe the Symptom to Eager Edgar

Since the infrastructure looks healthy, describe the user-facing symptom rather than a specific Kubernetes state.

Sample Prompts

Prompt Examples:

"The checkout flow is broken in online-boutique-test"
"Users are getting errors when they try to purchase items in the test environment"
"Something is wrong with the online-boutique-test namespace — the app isn't working correctly"
"Check service connectivity in online-boutique-test"

What Eager Edgar Does

The interaction flow:

You say: "Checkout is broken in online-boutique-test"
Edgar suggests diagnostics for service health
You say: "Run all"
Tasks check pods, logs, configs, and events
Edgar identifies the configuration issue

Eager Edgar will suggest tasks such as:

Check Pod Status — Confirms pods are running (they are)
Inspect Pod Logs — Looks for error messages in running containers
Check Service Endpoints — Verifies services have healthy backends
Inspect Environment Variables — Reviews configuration injected into pods
Check ConfigMaps and Secrets — Reviews configuration resources
Check Recent Events — Looks for warnings

Step 3: Run the Diagnostic Tasks

Click "Run All" to execute the suggested tasks. Since pods are running, this investigation requires looking deeper than pod status.

Expected Findings

Finding 1: Pods Are Running (but that's not the whole story)

✅ All 11 pods are in Running state
   All deployments show desired replica count
   No CrashLoopBackOff detected

Finding 2: Application Logs Show Connection Errors

🟡 checkoutservice logs:
   "error: failed to charge order: could not connect to
    paymentservice.online-boutique-wrong:50051: 
    dial tcp: lookup paymentservice.online-boutique-wrong: no such host"

Finding 3: Environment Variable Mismatch

⚠️ Environment variable mismatch detected:
   Deployment: checkoutservice
   Variable: PAYMENT_SERVICE_ADDR
   Current value: "paymentservice.online-boutique-wrong:50051"
   Expected value: "paymentservice:50051"

Finding 4: Recent ConfigMap Change

⚠️ ConfigMap 'service-config' was updated recently
   Last modified: <timestamp>
   Contains service endpoint overrides

Step 4: Analyze the Root Cause

Analysis path: Symptoms (Checkout fails, pods healthy) → Logs show connection error → Wrong service address in environment variable → ConfigMap has incorrect endpoint → Root Cause: Configuration error — wrong service endpoint in ConfigMap.

What the Evidence Tells Us

Signal	What It Means
Pods running	The container image and code are fine — this isn't a code bug
Connection error in logs	A service is trying to reach another service at a wrong address
Wrong environment variable	The service address was overridden by a ConfigMap or deployment spec
Recent ConfigMap change	Someone (or an automated process) changed the configuration recently

Root Cause

The checkoutservice is configured with an incorrect address for the paymentservice. The environment variable PAYMENT_SERVICE_ADDR points to a wrong namespace or hostname. This is a classic configuration issue — the code works fine, but it's been told to connect to the wrong place.

This commonly happens when:

Environment-specific configs are copy-pasted with leftover values
A ConfigMap was updated with an incorrect value
A promotion from one environment to another carried the wrong endpoint

Step 5: Ask for Remediation Guidance

Sample Follow-Up Prompts

Prompt Examples:

"How do I fix the payment service address in checkoutservice?"
"What should PAYMENT_SERVICE_ADDR be set to?"
"Fix the configuration in online-boutique-test"
"Show me the ConfigMap that needs to change"

The Fix

Eager Edgar will guide you to correct the configuration. The fix involves updating the environment variable to point to the correct service:

Corrected Configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkoutservice
  namespace: online-boutique-test
spec:
  template:
    spec:
      containers:
        - name: server
          env:
            - name: PAYMENT_SERVICE_ADDR
              value: "paymentservice:50051"    # Fixed: removed wrong namespace

Or if using a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: service-config
  namespace: online-boutique-test
data:
  PAYMENT_SERVICE_ADDR: "paymentservice:50051"

How to apply:

kubectl apply -f <fixed-config>.yaml -n online-boutique-test

After applying, the checkoutservice pod will pick up the new configuration (you may need to restart the pod for environment variable changes to take effect):

kubectl rollout restart deployment/checkoutservice -n online-boutique-test

Step 6: Verify the Fix

Ask Eager Edgar to Verify

Prompt Examples:

"Is the checkout flow working now in online-boutique-test?"
"Check if checkoutservice can reach paymentservice"
"Are there still connection errors in the logs?"

Success Criteria

✅ Checkout service logs show successful payment calls
✅ No "connection refused" or "no such host" errors in logs
✅ All pods still running
✅ End-to-end checkout flow completes successfully

What You Learned

Key Takeaways

Running doesn't mean healthy. Pods in Running state can still have functional failures. Always check application logs, not just pod status.
Configuration issues are sneaky. They don't cause crashes — they cause incorrect behavior. Look for connection errors, wrong endpoints, and missing values in logs.
Environment variables and ConfigMaps are common culprits. When services can't communicate, check how they discover each other — environment variables, ConfigMaps, and DNS entries.
AI Assistants look beyond status. When you describe a functional symptom ("checkout is broken"), Eager Edgar investigates logs, configs, and events — not just pod status.

Troubleshooting Pattern: Configuration Issues

The pattern: App errors but pods running → Check logs for connection/config errors → Wrong address or value? → If yes: Fix ConfigMap / Env Var. If no: Check Secrets / Network Policies.

Comparing Scenarios 1 and 2

	Scenario 1: Code Error	Scenario 2: Config Error
Pod status	CrashLoopBackOff	Running
Where the error shows	Container exit code + startup logs	Runtime logs during specific operations
Root cause	Bug in code	Wrong value in configuration
Fix location	Application source code	Kubernetes config (ConfigMap, env var)
Detection difficulty	Easy (pod is clearly broken)	Moderate (pod looks healthy)

Next Steps

Ready for the most challenging scenario? In Production, the application and configuration are both correct — but an infrastructure dependency is down.

→ Next: Scenario 3: Database Connection Failure (Prod)