Kubernetes: Application Resource Tuning

This scenario shows how an application developer with no Kubernetes training can remediate a resource configuration issue that is causing their application get stuck on startup.

This tutorial heavily utilizes GIFS, which may take time to load in your browser.

Scenario Overview

The recipes application is deployed in the dedicated RunWhen Workspace named b-sandbox. It has a single public URL, is running in a Google GKE cluster, is deployed from a git repository, and is comprised of application and database container images.

Application URL: https://recipes.sandbox.runwhen.com/
Application Git Repository: https://github.com/runwhen-contrib/demo-sandbox-tandoor-recipes

If you click on the application URL above, you will immediately see that the is not active, but it might not be obvious why.

This tutorial will walk you through:

Engaging with Engineering Assistant Eager Edgar
Asking Eager Edgar for suggestions on what to run when we can't access our application
Running Eager Edgar's suggested tasks
Reviewing the task results, warnings, and suggested next steps
Identifying and remediating the root cause of the application failure an

https://youtu.be/jUZ9I9gCcAc https://youtu.be/jUZ9I9gCcAc

Getting Started in the RunWhen Platform

Upon logging into the platform, you will be shown a list of Workspaces that are accessible to you. If this is your first time, you will see some public Workspaces - these are created for demonstration and exploration purposes.

Selecting a Workspace

For this scenario, please select the following workspace:

Sandbox

Upon clicking the workspace, you will be dropped into the workspace map - an interactive method of searching and navigating across the resources in the workspace.

Asking Eager Edgar to Help with Pods Not Starting

Since the main issue right now is that the application looks like the main recipes application Pod is not running, ask Eager Edgar for a list of recommended troubleshooting tasks.

Select the Command Bar, and
- Type Recipes to search for appropriate group, and select the group
Select the Command Bar, and
- Type in a statement such as "Pods are not starting", or "Check pod health", or "Check pod resources"
- Review the suggested tasks and select any that you wish to run, or click RUN ALL to run them all

Asking Eager Edgar for Troubleshooting Suggestions

Reviewing Results & Running More Tasks

In this next step, review the Issues generated by that task. Notice that there are a few Issues, such as:

Deployment recipes has status: Deployment does not have minimum availability.
Pod recipes-5cb59585d4-q7j82 is pending with N/A
Deployment recipes generated 132 warning events and should be reviewed.

Each of these Issues will list some Suggested Next Steps. Select ASK on some of these suggestions and run the top suggestions that Eager Edgar provides.

Reviewing Issues and Running Additional Tasks

Identifying and Remediating The Root Cause

After running a number of tasks, review the Issues tab (which highlights each issue, sorted by severity):

Notice that there is an Issue related to the Resource quota is at or above 100% in namespace recipes
Review the Suggested Next Steps, and Ask Eager Edgar if any tasks match the suggestion of Increase the resource quota for requests.cpu in `recipes`
Run the top suggested task

When the task completes, review the the Suggested Next Steps for:
- Pull Requests for manifest changes are open and in need of review for namespace recipes
- Visit the URL of the Pull Requests that were opened
- Click the Escalate icon on the Issue to notify the service owner that the PR requires approval

Reviewing the Report Output

At any time throughout the troubleshooting process it is possible to continue running tasks, ask for more suggestions, or review the output of the report. The report will continually highlight issues that might require additional investigation.

Looking at the report history:

A generic query such as Pods are not starting led to a number of suggestions in the recipes group
Tasks were run to check:
- Pod Resources
- Namespace Health
- Deployment Health
Issues were generated, indicating that in fact:
- The main application Pod was not running (and stuck pending)
- Resources issues with the cluster existed
- The Pod should be reconfigured to reduce its resource requests
- The Resource Quota was at capacity and in need of being increased
Tasks were suggested that matched an available GitOps Remediation Task, such as:
- Increase the resource quota for requests.cpu in recipes
- Adjust pod resources to match VPA recommendation in recipes
GitHub Pull Requests were opened that would fix the root cause of the resource constraints:
- [RunWhen] - GitOps Manifest Updates for Deployment-recipes
- [RunWhen] - GitOps Manifest Updates for ResourceQuota-compute-resources