Skip to content

SLO Configuration

A Service Level Objective (SLO) defines a reliability target for your service. RunWhen SLOs use multi-window, multi-burn-rate (MWMB) alert rules generated from a simple objective definition. When error budget burn rate exceeds safe thresholds, alerts fire automatically.

Managing SLOs in the UI

SLOs appear alongside their parent SLX in Workspace Studio > Tasks. Open the SLX Preview (eye icon) to access health and run data.

The Health tab shows the SLI metric that the SLO tracks. When an SLO is configured, the platform uses this metric together with the objective percentage to calculate error budget consumption and burn rate.

SLX Preview — Health tab showing the SLI metric that the SLO tracks

The RunSessions tab lists historical task runs triggered by SLO burn-rate alerts, alongside manually triggered runs. Each session shows the number of issues found and when it last ran.

SLX Preview — RunSessions tab showing historical task executions and issue counts

When creating or editing an SLO through the UI, you set the objective percentage and threshold — the platform handles the alert rule generation automatically.


Spec Reference

The sections below document the full SLO Custom Resource spec for users who manage SLX configuration through Git or need to understand the underlying data model.

SLO Spec Overview

apiVersion: runwhen.com/v1
kind: ServiceLevelObjective
metadata:
name: my-workspace--my-slx-slo
labels:
workspace: my-workspace
slx: my-workspace--my-slx
spec:
codeBundle:
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection
pathToYaml: codebundles/slo-default/queries.yaml
ref: main
slxSpecType: simple-mwmb
objective: 99.9
threshold: 9
operand: eq

Spec Fields

FieldTypeDefaultDescription
codeBundleobjectrequiredGit reference to the queries.yaml file
slxSpecTypestringsimple-mwmbSLO type — currently only simple-mwmb is supported
objectivefloat99.9Target reliability percentage (e.g. 99, 99.9, 99.99)
thresholdfloat9Value at which the metric is considered “out of SLO”
operandstringeqComparison operator applied to the threshold

CodeBundle (Query Bundle)

Unlike SLIs and Runbooks which reference Robot Framework scripts, the SLO CodeBundle points to a queries.yaml file that defines the query templates:

FieldRequiredDescription
repoUrlYesGit repository URL
pathToYamlYesPath to the queries.yaml file (e.g. codebundles/slo-default/queries.yaml)
refYesGit ref (default: main)

queries.yaml Structure

The query file defines two query templates used to calculate error budget:

errorQuery: "sum_over_time((count({metric_name} > {threshold}))[{window}:]) OR on() vector(0)"
totalQuery: "sum_over_time((count({metric_name} > 0))[{window}:])"

These placeholders are resolved automatically when the SLO is processed:

PlaceholderResolved From
{metric_name}SLX name (hyphens replaced with underscores)
{threshold}spec.threshold value
{window}Time window for burn-rate calculation
{operand}Symbol form of spec.operand

Objective

The objective field sets the target reliability percentage. This determines how much error budget is available before alerts fire.

ObjectiveAllowed Downtime (30 days)Error Budget
99.0~7.2 hours1.0%
99.9~43 minutes0.1%
99.99~4.3 minutes0.01%

Operand

The operand field defines how the SLI metric is compared against the threshold to determine if the service is “in SLO” or “out of SLO”:

OperandSymbolMeaning
eq==Out of SLO when metric equals threshold
lt<Out of SLO when metric is less than threshold
gt>Out of SLO when metric is greater than threshold
neq!=Out of SLO when metric does not equal threshold
le<=Out of SLO when metric is less than or equal to threshold
ge>=Out of SLO when metric is greater than or equal to threshold

How SLO Alerts Work

SLO alerts use a Multi-Window Multi-Burn-Rate (MWMB) methodology, which is different from simple threshold alerting:

  1. Error budget calculation — the platform calculates error and total events using the queries from queries.yaml
  2. Burn rate windows — multiple time windows are evaluated simultaneously (e.g. 1h, 6h, 3d) to detect both fast and slow budget consumption
  3. Alert generation — the platform generates alert rules with two tiers:
Alert TypeSeverityTrigger
PageHigh — immediate responseError budget is being consumed rapidly (short window burn rate exceeded)
TicketMedium — investigate soonError budget is being consumed steadily (long window burn rate exceeded)
  1. Evaluation — the generated rules are evaluated continuously against your SLI metric

SLO vs SLI Alerts

AspectSLI Threshold AlertsSLO Burn-Rate Alerts
What triggers itMetric crosses a single thresholdError budget burn rate exceeds safe levels
Time windowsSingle optional for durationMultiple windows evaluated simultaneously
ConfigurationOperator, threshold, duration per severityObjective percentage + threshold
Best forImmediate metric anomaliesSustained reliability degradation

Example Configurations

Standard availability SLO (99.9%)

spec:
objective: 99.9
threshold: 9
operand: eq
codeBundle:
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection
pathToYaml: codebundles/slo-default/queries.yaml
ref: main

Relaxed SLO for non-critical service (99%)

spec:
objective: 99
threshold: 5
operand: gt
codeBundle:
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection
pathToYaml: codebundles/slo-default/queries.yaml
ref: main

Strict SLO for payment service (99.99%)

spec:
objective: 99.99
threshold: 1
operand: lt
codeBundle:
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection
pathToYaml: codebundles/slo-default/queries.yaml
ref: main