Rules catalog

Everything Guard ships with. Toggle rules and bundle them into policies from Settings → Guard. Each rule is off until added to a policy — except the default heal policy listed at the bottom.

Static rules

Deterministic, sub-millisecond, no LLM calls.

Security

Name	Catches	Severity	Runs on
`dangerous`	`rm -rf /`, `dd`, fork bombs, fs/process destruction, privilege escalation, supply-chain `curl \| sh`	critical	input
`path_traversal`	`../../etc/passwd`, encoded traversal, sensitive system paths, null-byte injection	critical	input
`sql_injection`	`UNION SELECT`, `xp_cmdshell`, `OR 1=1`, time-based blind, stacked queries	critical	input
`ssrf_detection`	metadata endpoints (`169.254.169.254`), private IPs, encoded IPs, `file://` / `gopher://` schemes	critical	input
`tool_allowlist`	tool calls outside the declared allowlist	critical	input
`tool_denylist`	seed denylist (`fs.delete`, `eval`, `exec`, `db.drop`, `shell.exec`) plus your additions	critical	input
`secrets`	leaked AWS / OpenAI / Anthropic / GitHub / Stripe / Slack keys, PEM private keys, password assignments	critical	input + output
`insecure_output_handling`	`<script>`, `<iframe>`, `javascript:` URIs, CSS injection, markdown XSS	high	input + output
`pii`	SSN (with checksum), credit cards (Luhn-validated), emails, phone numbers, passport numbers	high	input + output

Compliance & finance

Name	Catches	Severity	Runs on
`payment_threshold`	payment, refund, transfer, or charge amounts above a configurable cap	critical	input
`refund_validation`	refunds without a ticket, order, or approval reference	high	input
`email_rules`	bulk recipients, sends to disposable domains (mailinator, guerrillamail, …)	high	input

Quality & behaviour

Name	Catches	Severity	Runs on
`bulk_dump`	high-entropy output, base64 blobs, repeated n-grams, output near field caps	high	input + output
`confusion_loop`	consecutive tool repeats, A↔B oscillation, single-tool dominance, error-retry storms	medium	trace
`hallucinated_tool_reference`	tool names not in the known toolset, malformed names	medium	input
`content`	profanity word list and verbatim system-prompt phrase leaks	medium	input + output

Operational

Name	Catches	Severity	Runs on
`cost_cap_per_trace`	cumulative trace LLM cost above a configurable cap	medium	trace

LLM judges

Model-based. A few hundred ms per call. Use on high-stakes tools.

Security

Name	Catches	Severity	Runs on
`prompt_injection`	attempts to override system instructions, exfiltrate prompts, hijack tools via user input	critical	trace
`jailbreak_detection`	multi-step bypasses, roleplay tricks, encoding schemes, authority manipulation	critical	trace
`indirect_injection`	injections planted in trace spans, file contents, error messages, API responses	critical	trace
`data_exfiltration`	webhooks to non-allowlisted hosts, raw IPs, DNS exfil, tunneling, MCP token theft	critical	trace
`unauthorized_action`	tool calls outside the agent's declared permissions and scope	high	trace

Safety

Name	Catches	Severity	Runs on
`dangerous_operation`	irreversible filesystem destruction, reverse shells, privilege escalation, fork bombs, supply-chain attacks	high	trace
`toxicity`	threats, slurs, harassment, brand-unsafe content in agent output	medium	output

Quality

Name	Catches	Severity	Runs on
`context_degradation`	loops, self-contradiction, drift, coherence collapse, rediscovery, abandoned hypotheses	medium	trace
`goal_deviation`	drift from the declared agent goal, scope creep	medium	trace
`false_completion`	"done" claims without trace evidence of the actual work	high	trace
`wrong_tool_selection`	semantic mismatch between task intent and tool choice	medium	trace
`hallucination_in_args`	fabricated identifiers, nonsensical parameters, made-up file paths or endpoints	high	input
`response_hallucination`	fabricated facts, citations, identifiers, or claims in agent output	medium	output
`rag_factual_consistency`	answers not grounded in retrieved context (ragas-style faithfulness)	high	trace

Operational

Name	Catches	Severity	Runs on
`cost_escalation`	oversized payloads, agent spawning, recursive expansion, exposed LLM endpoints	medium	input

Heal pipeline

These judges are scoped to Staso's built-in self-debugging agent and only fire on heal.* tools.

Name	Catches	Severity	Runs on
`system_prompt_leak`	verbatim or paraphrased system-prompt fragments in diagnosis output	high	output
`heal_out_of_scope`	composer messages unrelated to the current trace	low	input

Default policy

Heal — Staso's built-in self-debugging agent — runs this policy with no setup required:

Rule	Mode	Why
`dangerous`	enforce	irreversible ops
`prompt_injection`	enforce	jailbreak attempts
`indirect_injection`	enforce	injections in accumulated trace data
`system_prompt_leak`	enforce	prevent system-prompt exfil
`response_hallucination`	audit	quality signal, not a block
`heal_out_of_scope`	audit	relevance check

Custom agents start clean. Pick the rules you want from Settings → Guard and ship them as a policy.

Severity

low, medium, high, critical. Surfaced in result.severity and on every blocked-or-modified span.

Where each rule runs

input — checks tool arguments before the call.
output — checks the result after the call.
input + output — both.
trace — operates on accumulated spans, prior violations, and cumulative state.

Rules and policies — how to bundle and deploy.
Actions and escalation — what each action returns.
Manual checks — call Guard from non-patched code.

Rules catalog

Static rules

Security

Compliance & finance

Quality & behaviour

Operational

LLM judges

Security

Safety

Quality

Operational

Heal pipeline

Default policy

Severity

Where each rule runs

Next

On this page