Kubernetes: Search Application Logs For Stack Traces
Engineering Assistants can respond to errors in application logs, collecting valuable diagnostics for your developers that include recent code diffs, active env variables, etc. to speed up bug fixing.
This page is currently under construction and contains unfinished content
This scenario shows how a tier-1 support can triage a complex application error, quickly determining if it's platform/infrastructure related, or the application code itself has a bug. To do this the platform will parse the logs of an application stack for stack traces, and extract key triage information like a developer would, such as what files the exceptions originated from, and their recent changes in git. All of this triage information can then be automatically included in a GitHub Issue that's submitted to the application's Git Repository.
Note that while the application example in this tutorial is written in C#, many languages and framework's stack traces can be parsed by changing a configuration field in the codebundle.
This tutorial heavily utilizes GIFS, which may take time to load in your browser.This tutorial heavily utilizes GIFS, which may take time to load in your browser.

Scenario Overview
The cats vs dogs voting application is deployed in a dedicated RunWhen Workspace named Sandbox
. It has two public URLs, Kubernetes cluster, is deployed from a Git repository, and is comprised of application and database container images.
Application URL(s): Voting page & Results page
Application Git Repository: https://github.com/runwhen-contrib/demo-sandbox-example-voting-app

Voting App Submission Page

Voting App Results Page
Voting App Results Page
While the application is available at the links above and appears fine on the surface, there's actually a hidden data processing error on the backend that affects user's votes. Some votes are not showing up in the results page, even though they appear to successfully submit.
This tutorial will walk you through:
Engaging with Engineering Assistant Eager Edgar
Asking Eager Edgar for suggestions on triaging whether an issue is platform-related or application-related
Running Eager Edgar's suggested tasks
Reviewing the task results, warnings, and suggested next steps
Identifying a potential application failure that's affecting end-users and automatically open a GitHub Issue for it with the triage information included for you
Getting Started in the RunWhen Platform
Upon logging into the platform, you will be shown a list of Workspaces that are accessible to you. If this is your first time, you will see some public Workspaces - these are created for demonstration and exploration purposes.

Selecting a Workspace
For this scenario, please select the following workspace:
Sandbox
Upon clicking the workspace, you will be dropped into the workspace map - an interactive method of searching and navigating across the resources in the workspace.

Asking Eager Edgar to Troubleshoot Voting Application Errors
Because users have reported some of their votes failing to show up in results, we'll need to look slightly deeper than the frontend and open a GitHub issue on the repo once we've found a suspected cause. The root of the problem could be platform-related or application-related after all. Firstly, let's ask Eager Edgar to help us troubleshoot these errors.
Select the Command Bar, and
Type
voting
into the search for appropriate group, and select theVoting-App
group
Select the Command Bar, and
Type in the statement "Troubleshoot voting errors"
Review the suggested tasks and select any that you wish to run, or click RUN ALL to run them all

Querying the app for suggested tasks
Reviewing Results & Inspecting Issues
Most importantly, one result of the tasks we ran was a GitHub Issue that was opened that mimics what a developer might do in their first hour of triaging an application error. This Issue has key triage information that was included for us like:
the exception message(s)
The source code URL(s)
recent git commit hashes for those files the exceptions originated from so we can quickly see recent changes and who did them
A copy-paste ready command to fetch the same exception logs that were parsed for us so we can see the same logs
A link back to the RunSession on the platform

Automatically Opened Issue with Stacktrace Details
All of this was collected and submitted for us without actually connecting to the workload and manually parsing that ourselves!

Reviewing Generated Issues
Ultimately these exceptions also tell us that the voting errors our users are experiencing are originating from the application code and are very likely not related to infrastructure. This point is further supported by the fact our report contains no high-severity infrastructure-related issues, so there's no need to contact our platform team.

Reviewing Report
Here's an overview of what we did in this tutorial and what we discovered:
A generic query "Troubleshoot voting errors" lead to us determining if the issue was platform-related or application-related in the
Voting-App
groupTasks were run to check:
For issues across the namespace like pod and deployment health
Application logs were scanned for exceptions and parsed
kubernetes Event streams were checked in the namespace
Issues were generated which told us:
No platform-related errors were found so we don't need to contact the platform team
The app is throwing exceptions, specifically its C# worker service
Where to find the opened GitHub issue for the service and its source code for future triage by the owning application developers
GitHub issues were opened for these related application issue(s):
