Skip to content

SLI Configuration

A Service Level Indicator (SLI) is an automated measurement that runs on a schedule. Each SLI executes a script (via a CodeBundle) that returns a numeric metric, which is stored and evaluated for alerting.

Managing SLIs in the UI

SLIs are configured as part of an SLX in Workspace Studio > Tasks. Open the SLX Preview (eye icon) and select the Health tab to see live SLI data.

SLX Preview — Health tab showing SLI metric values over time

The Health tab displays:

  • SLI Values — a time-series chart of the metric, with a configurable time range (1 hour, 6 hours, 24 hours, etc.)
  • Debug Log — link to the raw SLI execution log for troubleshooting

The Metadata tab shows the SLX name, owners, resource group, and all tags (platform, cluster, namespace, resource type, access level) that drive alert routing and task discovery.

SLX Preview — Metadata tab showing SLX name, owners, group, and tags

When creating or editing an SLI through the UI, the platform presents the key fields described in the spec reference below — interval, CodeBundle, alert thresholds, and task triggers — without requiring you to write YAML directly.


Spec Reference

The sections below document the full SLI Custom Resource spec for users who manage SLX configuration through Git or need to understand the underlying data model.

SLI Spec Overview

apiVersion: runwhen.com/v1
kind: ServiceLevelIndicator
metadata:
name: my-workspace--my-slx-sli
labels:
workspace: my-workspace
slx: my-workspace--my-slx
spec:
codeBundle:
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection
pathToRobot: codebundles/k8s-namespace-healthcheck/sli.robot
ref: main
intervalSeconds: 60
intervalStrategy: intermezzo
locations:
- northamerica-northeast2-01
configProvided:
- name: NAMESPACE
value: production
secretsProvided:
- name: kubeconfig
workspaceKey: kubeconfig
displayUnitsLong: "percent available"
displayUnitsShort: "%"
alerts:
warning:
operator: "lt"
threshold: "0.99"
for: "5m"
ticket:
operator: "lt"
threshold: "0.95"
for: "10m"
page:
operator: "lt"
threshold: "0.9"
alertConfig:
tasks:
persona: eager-edgar
sessionTTL: 10m

Spec Fields

FieldTypeDefaultDescription
codeBundleobjectrequiredGit reference to the SLI script
intervalSecondsinteger60How often the SLI runs (in seconds)
intervalStrategystringintermezzoScheduling strategy for the runner
locationsstring[]Runner location(s) where the SLI executes
configProvidedobject[]Environment variables passed to the script
secretsProvidedobject[]Workspace secrets mapped into the script
servicesProvidedobject[]Location service bindings
displayUnitsLongstringHuman-readable unit label (e.g. “percent available”)
displayUnitsShortstringShort unit label, max 3 characters (e.g. ”%”, “ms”)
alertsobjectThreshold alert configuration (see below)
alertConfigobjectAutomatic task-trigger configuration (see below)

CodeBundle

The codeBundle field points to the script that the SLI runs:

FieldRequiredDescription
repoUrlYesGit repository URL
pathToRobotYesPath to the .robot file within the repository
refYesGit ref — branch, tag, or commit (default: main)

Config and Secrets

configProvided — static environment variables:

configProvided:
- name: NAMESPACE
value: production
- name: CONTEXT
valueFrom:
workspace: CONTEXT

Each entry supports either a literal value or a valueFrom reference that resolves from the workspace or SLX configuration.

secretsProvided — workspace secret mappings:

secretsProvided:
- name: kubeconfig
workspaceKey: kubeconfig

The workspaceKey references a secret stored in the workspace’s secret store.

Threshold Alerts (spec.alerts)

Threshold alerts fire when the SLI metric crosses a user-defined value. Your SLI script can return any numeric value — not just 0 or 1 — giving you full control over how the metric is evaluated.

Structure

Threshold alerts are defined per severity level:

alerts:
warning:
operator: "lt"
threshold: "0.95"
for: "5m"
ticket:
operator: "lt"
threshold: "0.9"
for: "10m"
page:
operator: "lt"
threshold: "0.8"

Fields

FieldRequiredTypeDescription
operatorYesstringComparison operator for the metric value
thresholdYesstringNumeric value to compare against (parsed as float64)
forNostringDuration the condition must hold before firing (e.g. 5m, 1h)

Operators

WordSymbolMeaning
lt<Less than
le<=Less than or equal to
eq==Equal to
ge>=Greater than or equal to
gt>Greater than
ne!=Not equal to

Severity Levels

SeverityTypical Use
warningInformational — may not require immediate action
ticketShould be investigated
pageRequires immediate response

You can define one, two, or all three. Each operates independently with its own operator, threshold, and duration.

The for Duration

When for is set, the condition must be continuously true for the specified duration before the alert fires. This prevents transient fluctuations from triggering false alerts.

Without for — fires immediately:

metric_name < 0.95

With for: "5m" — fires after 5 minutes:

last_over_time(metric_name[5m]) < 0.95

Alert Modes

Each threshold alert has a mode (managed in CRD status, not spec):

ModeBehavior
activeAlert fires when threshold is breached (default)
silencedAlert does not fire but is still visible in the UI
disabledAlert is completely disabled and hidden

Task-Trigger Alerts (spec.alertConfig)

Task-trigger alerts are a separate, implicit mechanism. When configured, the platform automatically runs the SLX’s tasks whenever any sub-metric drops below 1.0. The operator and threshold are hardcoded — only the task execution behavior is configurable.

alertConfig:
tasks:
persona: eager-edgar
sessionTTL: 10m
FieldDefaultDescription
tasks.personaeager-edgarWhich AI Assistant persona runs the triggered task
tasks.sessionTTL10mCooldown — no new task run until this time has elapsed since the last run session

Threshold Alert Examples

Availability (0.0 – 1.0)

alerts:
warning:
operator: "lt"
threshold: "0.99"
for: "5m"
ticket:
operator: "lt"
threshold: "0.95"
for: "10m"
page:
operator: "lt"
threshold: "0.9"

Response Latency (milliseconds)

alerts:
warning:
operator: "gt"
threshold: "200"
for: "5m"
page:
operator: "gt"
threshold: "500"

Unhealthy Pod Count

alerts:
ticket:
operator: "ge"
threshold: "1"
page:
operator: "ge"
threshold: "3"

Error Rate (percentage)

alerts:
warning:
operator: "gt"
threshold: "1"
for: "10m"
ticket:
operator: "gt"
threshold: "5"
for: "5m"
page:
operator: "gt"
threshold: "10"

How SLI Alerts Are Evaluated

  1. SLI script runs on the configured intervalSeconds schedule at the specified runner location
  2. Metric is stored under the SLX’s metric name
  3. Alert rules are generated from your spec.alerts thresholds and evaluated continuously
  4. When a threshold is breached — immediately, or after the for duration — the alert fires
  5. An issue is created and linked to the SLX
  6. Task triggers execute automatically if alertConfig is configured