Responsible AI Use / Generative AI Policies

Last Updated: July 2025

At RunWhen, we build AI-powered Site Reliability Engineering tools designed to help modern teams detect, triage, and remediate issues across complex cloud environments. Our platform is architected to focus on infrastructure and application telemetry.

This privacy policy outlines how RunWhen configures, tests, and monitors the use of AI within its platform. Our objective is to ensure that AI is applied responsibly, transparently, and with a measurable standard of quality and risk awareness.

This policy applies to any AI-based feature offered in the RunWhen platform, including those used for task recommendations, automation summaries, or incident context generation.

AI Configuration Specifications

RunWhen does not train its own foundation models. All generative AI capabilities are powered by a combination of RunWhen-hosted open source models and, when configured, our customers' own foundational models (“bring your own endpoint”). In configurations where RunWhen is hosting open source models, the company provides configuration options for single-tenant models to ensure data isolation guarantees.

As noted in the Data Security and Privacy section, they are used with ephemeral, customer-specific operational metadata as context. These data sources include:

Output from automation scripts and health checks
Metadata about cloud infrastructure and Kubernetes configurations
Plaintext observations extracted from logs or CLI tools

No customer application data, customer PII, or confidential business logic is used as input to foundation model training or fine-tuning.

Context used for prompting is maintained in transient memory for the duration of task execution and is never stored, indexed, or retained after the task completes.

Testing Process

RunWhen maintains a library of predefined automation scenarios in RunWhen’s internal tech stack and various sample workloads for other infrastructure (“sandbox”) that map specific operational drivers (e.g., disk pressure, kubelet failures, misconfigured ingress) to desired task outputs or AI-driven summaries. Scenarios include:

A known root cause and structured metadata
A set of validated automation responses
A baseline summary that is compared against the LLM's generated output

Generated content is tested against these scenarios during both initial integration and after each platform release to ensure the LLM can correctly associate operational indicators with intended outcomes.

Model Testing and Review Process

RunWhen conducts structured testing across a wide range of operational environments and conditions, including but not limited to:

Normal and degraded Kubernetes states
Common misconfigurations across major cloud providers
Concurrent failure conditions (e.g., network partition + out-of-memory + missing config maps)

Each test run evaluates the AI’s ability to:

Accurately summarize conditions
Avoid hallucinations or misleading suggestions
Recommend automation actions that are consistent with internal safety guidelines

Testing is repeated during:

New AI provider integrations
Platform version upgrades
Changes to context injection pipelines

Performance Monitoring and Reporting

RunWhen tracks the performance of AI-generated outputs using a combination of automated validation and manual review. This includes:

Internal red-teaming of outputs on edge cases
Logging and classifying model errors or hallucinations
Addressing customer-submitted feedback on AI suggestions
Internal flagging of “near miss” events — defined as AI-generated content that was incorrect but not executed due to human review or automated safeguards

All issues are reviewed weekly by engineering leadership. Findings are used to refine prompt construction, update scenario tests, and improve safety filters.

RunWhen does not allow autonomous execution of AI-generated actions. The actions that are executed are pre-defined by experts in our community of “task authors” and have been imported in to our customers environment. These actions can only be executed when an AI Assistant is given explicit permission in the form of tags, e.g. “Eager Edgar is allowed to run all tasks tagged with ‘permissions: read-only’ and ‘env: dev'.

Even then, a task must pass configurable confidence thresholds before it can be run autonomously. In all other cases, human review is required prior to execution.

Scrubbing, Transparency and Customer Control

Customers can audit all text sent to/from any LLM (including their own) to ensure the accuracy of our two-tier scrubbing process mentioned in the Data Security Framework section.

Contact

For questions about this policy or to request additional documentation for due diligence purposes, please contact

security-and-compliance@runwhen.com.