Skip to main content
Skip table of contents

Kubernetes: Search Application Logs For Stack Traces

Engineering Assistants can respond to errors in application logs, collecting valuable diagnostics for your developers that include recent code diffs, active env variables, etc. to speed up bug fixing.

This page is currently under construction and contains unfinished content

This scenario shows how a tier-1 support can triage a complex application error, quickly determining if it's platform/infrastructure related, or the application code itself has a bug. To do this the platform will parse the logs of an application stack for stack traces, and extract key triage information like a developer would, such as what files the exceptions originated from, and their recent changes in git. All of this triage information can then be automatically included in a GitHub Issue that's submitted to the application's Git Repository.

Note that while the application example in this tutorial is written in C#, many languages and framework's stack traces can be parsed by changing a configuration field in the codebundle.

This tutorial heavily utilizes GIFS, which may take time to load in your browser.This tutorial heavily utilizes GIFS, which may take time to load in your browser.

Scenario Overview

The cats vs dogs voting application is deployed in a dedicated RunWhen Workspace named Sandbox. It has two public URLs, Kubernetes cluster, is deployed from a Git repository, and is comprised of application and database container images.

Voting App Submission Page

Voting App Results Page

Voting App Results Page

While the application is available at the links above and appears fine on the surface, there's actually a hidden data processing error on the backend that affects user's votes. Some votes are not showing up in the results page, even though they appear to successfully submit.

This tutorial will walk you through:

  • Engaging with Engineering Assistant Eager Edgar

  • Asking Eager Edgar for suggestions on triaging whether an issue is platform-related or application-related

  • Running Eager Edgar's suggested tasks

  • Reviewing the task results, warnings, and suggested next steps

  • Identifying a potential application failure that's affecting end-users and automatically open a GitHub Issue for it with the triage information included for you

Getting Started in the RunWhen Platform

Upon logging into the platform, you will be shown a list of Workspaces that are accessible to you. If this is your first time, you will see some public Workspaces - these are created for demonstration and exploration purposes.

Selecting a Workspace

For this scenario, please select the following workspace:

  • Sandbox

Upon clicking the workspace, you will be dropped into the workspace map - an interactive method of searching and navigating across the resources in the workspace.

Asking Eager Edgar to Troubleshoot Voting Application Errors

Because users have reported some of their votes failing to show up in results, we'll need to look slightly deeper than the frontend and open a GitHub issue on the repo once we've found a suspected cause. The root of the problem could be platform-related or application-related after all. Firstly, let's ask Eager Edgar to help us troubleshoot these errors.

  • Select the Command Bar, and

    • Type voting into the search for appropriate group, and select the Voting-App group

  • Select the Command Bar, and

    • Type in the statement "Troubleshoot voting errors"

    • Review the suggested tasks and select any that you wish to run, or click RUN ALL to run them all

alst-runall.gif

Querying the app for suggested tasks

Reviewing Results & Inspecting Issues

Most importantly, one result of the tasks we ran was a GitHub Issue that was opened that mimics what a developer might do in their first hour of triaging an application error. This Issue has key triage information that was included for us like:

  • the exception message(s)

  • The source code URL(s)

  • recent git commit hashes for those files the exceptions originated from so we can quickly see recent changes and who did them

  • A copy-paste ready command to fetch the same exception logs that were parsed for us so we can see the same logs

  • A link back to the RunSession on the platform

Automatically Opened Issue with Stacktrace Details

All of this was collected and submitted for us without actually connecting to the workload and manually parsing that ourselves!

Reviewing Generated Issues

Ultimately these exceptions also tell us that the voting errors our users are experiencing are originating from the application code and are very likely not related to infrastructure. This point is further supported by the fact our report contains no high-severity infrastructure-related issues, so there's no need to contact our platform team.

report-review.gif

Reviewing Report

Here's an overview of what we did in this tutorial and what we discovered:

  • A generic query "Troubleshoot voting errors" lead to us determining if the issue was platform-related or application-related in the Voting-App group

  • Tasks were run to check:

    • For issues across the namespace like pod and deployment health

    • Application logs were scanned for exceptions and parsed

    • kubernetes Event streams were checked in the namespace

  • Issues were generated which told us:

    • No platform-related errors were found so we don't need to contact the platform team

    • The app is throwing exceptions, specifically its C# worker service

    • Where to find the opened GitHub issue for the service and its source code for future triage by the owning application developers

  • GitHub issues were opened for these related application issue(s):

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.