Tool Builder
The Tool Builder lets you write custom automation scripts, test them against live infrastructure, and commit them as SLXs — all from your IDE through the MCP server.
Overview
An SLX (Service Level Expectation) is RunWhen’s unit of automation. Each SLX contains:
- A task (runbook) — a bash or Python script that checks infrastructure and reports issues
- An optional SLI (indicator) — a script or schedule that triggers the task at regular intervals
The Tool Builder workflow turns a local script into a production SLX running on RunWhen’s infrastructure.
Workflow
1. Load context ─▶ 2. Write script ─▶ 3. Validate ─▶ 4. Test ─▶ 5. Iterate ─▶ 6. CommitStep 1 — Load context
Before writing any script, load the project’s infrastructure conventions:
Load my workspace context so you understand the infrastructure conventions.This calls get_workspace_context, which reads the project’s RUNWHEN.md file. The file describes naming patterns, database access rules, environment variables, severity guidelines, and other constraints your scripts should follow.
Step 2 — Write the script
Write a bash or Python script that follows the RunWhen contract (see Script contract below). Scripts must define a main() function that either returns issues (task) or a health metric (SLI).
Step 3 — Validate
The agent calls validate_script to check the script against the RunWhen contract — verifying the main() function exists, the return format is correct, and referenced environment variables are extracted.
Step 4 — Test
The agent discovers available secrets and runner locations (get_workspace_secrets, get_workspace_locations), then executes the script against live infrastructure:
Test this script in my workspace.This calls run_script_and_wait, which runs the script on a RunWhen runner and returns the full output — issues found, stdout, stderr, and status.
Step 5 — Iterate
Review the output. If the script needs changes, fix it and re-test. The cycle is: edit → validate → run → review.
Step 6 — Commit
Once the script produces the expected results, commit it as an SLX:
Commit this as an SLX called "k8s-pod-health" with the alias "Pod Health Check".This calls commit_slx, which writes the SLX configuration (YAML files) to the workspace Git repo. The SLX appears in Workspace Studio and can run on a schedule or on-demand.
Script contract
Scripts must follow a specific structure to integrate with the RunWhen runner.
Python task
def main(): import os
namespace = os.environ.get("NAMESPACE", "default")
issues = [] # ... your logic ... issues.append({ "issue title": "Pod CrashLooping", "issue description": f"Pod xyz in {namespace} has restarted 15 times", "issue severity": 2, "issue next steps": "Check pod logs and events", }) return issuesRules:
- Define a top-level
main()function - Return a
List[Dict]with the issue fields listed below - Do not call
main()directly — the runner calls it - Do not use
if __name__ == "__main__" - Use
os.environfor configuration variables - Secret vars are injected as env vars pointing to file paths (e.g. set
KUBECONFIG = os.environ["kubeconfig"])
Python SLI
def main(): # ... health check logic ... return 1.0 # 1.0 = healthy, 0.0 = unhealthySame rules as a task, but main() returns a float between 0 and 1.
Bash task
main() { issues='[]'
# Build issues with jq: # issues="$( # jq -n \ # --arg title "Descriptive title" \ # --arg desc "What happened" \ # --argjson severity 2 \ # --arg nextsteps "How to fix it" \ # '[{ # "issue title": $title, # "issue description": $desc, # "issue severity": $severity, # "issue next steps": $nextsteps # }]' # )"
jq -n --argjson issues "$issues" '$issues' >&3}Rules:
- Define a
main()function - Write the issue JSON array to file descriptor 3 (
>&3) - Use
jqfor reliable JSON construction - Do not call
maindirectly
Bash SLI
main() { echo "1.0" >&3}Same rules as a task, but write a single float (0-1) to file descriptor 3.
Issue fields
| Field | Type | Required | Description |
|---|---|---|---|
issue title | string | Yes | Short, descriptive title |
issue description | string | Yes | Detailed description of the finding |
issue severity | int (1-4) | Yes | 1 = critical, 2 = high, 3 = medium, 4 = low |
issue next steps | string | Yes | Recommended remediation steps |
issue observed at | string | No | ISO 8601 timestamp of observation |
Environment variables and secrets
Scripts receive configuration through two mechanisms:
- Environment variables (
env_vars) — non-sensitive config like namespace, context, or thresholds - Secret variables (
secret_vars) — mapped to workspace secret keys (e.g.kubeconfig, API tokens)
Secret vars are injected as environment variables whose values are file paths on the runner. For example, if you map kubeconfig to the workspace’s kubeconfig secret:
import osos.environ["KUBECONFIG"] = os.environ["kubeconfig"]Use get_workspace_secrets to discover available secret keys in your workspace.
SLI patterns
When committing an SLX, you can include an SLI that triggers the task automatically.
Custom SLI script
Write a separate script that returns a health metric (0-1). The SLI runs on an interval and the task triggers when the metric drops below healthy:
commit_slx( slx_name="my-health-check", task_type="task", script="...", # task script sli_script="def main(): ...", # SLI script (returns 0-1) sli_interval_seconds=300, # every 5 minutes)Cron-scheduled SLI
Trigger the task on a cron schedule instead of a health metric:
commit_slx( slx_name="my-health-check", task_type="task", script="...", # task script cron_schedule="0 */2 * * *", # every 2 hours)Common cron expressions:
| Expression | Schedule |
|---|---|
*/15 * * * * | Every 15 minutes |
0 * * * * | Every hour |
0 */2 * * * | Every 2 hours |
0 9 * * 1-5 | 9 AM on weekdays |
0 0 * * 0 | Sunday at midnight |
Infrastructure context (RUNWHEN.md)
The RUNWHEN.md file is a project-level document that provides domain-specific knowledge for scripts. Place it in your project root and the MCP server will auto-discover it. It captures the kind of tribal knowledge that a human engineer would share when onboarding someone to monitor a system:
- Which database replicas to query (and how to connect)
- Naming patterns for pods, services, and labels
- Environment variables scripts need
- Severity guidelines for issues
- Known gotchas and edge cases
Without a RUNWHEN.md, agents may make reasonable-but-wrong assumptions — like querying the primary database instead of a replica, or missing required kubectl flags.
The MCP server repository includes a template and an example to help you get started.
Required SLX tags
When committing an SLX, two tags are required:
| Tag | Values | Description |
|---|---|---|
access | read-write, read-only | Whether the task modifies resources or only reads/inspects |
data | logs-bulk, config, logs-stacktrace | The type of output the task produces |
These are set via the access and data parameters on commit_slx.
Example: end-to-end
Here’s what a typical conversation looks like when building a task through the MCP server:
You: “Load my workspace context.”
Agent: Calls get_workspace_context → reads RUNWHEN.md with infrastructure conventions.
You: “Write a Python task that checks for CrashLoopBackOff pods in the backend namespace.”
Agent: Writes a Python script following the RunWhen contract and your RUNWHEN.md conventions.
You: “Test it.”
Agent: Calls get_workspace_secrets and get_workspace_locations, then run_script_and_wait with appropriate env vars and secrets. Reviews the output — 2 issues found.
You: “Looks good. Commit it as k8s-crashloop-check and run it every 15 minutes.”
Agent: Calls commit_slx with cron_schedule="*/15 * * * *". The SLX is now live in Workspace Studio.