Staso Docs
Guard

Rules catalog

Everything Guard ships with. Toggle rules and bundle them into policies from Settings → Guard. Each rule is off until added to a policy — except the default heal policy listed at the bottom.

Static rules

Deterministic, sub-millisecond, no LLM calls.

Security

NameCatchesSeverityRuns on
dangerousrm -rf /, dd, fork bombs, fs/process destruction, privilege escalation, supply-chain curl | shcriticalinput
path_traversal../../etc/passwd, encoded traversal, sensitive system paths, null-byte injectioncriticalinput
sql_injectionUNION SELECT, xp_cmdshell, OR 1=1, time-based blind, stacked queriescriticalinput
ssrf_detectionmetadata endpoints (169.254.169.254), private IPs, encoded IPs, file:// / gopher:// schemescriticalinput
tool_allowlisttool calls outside the declared allowlistcriticalinput
tool_denylistseed denylist (fs.delete, eval, exec, db.drop, shell.exec) plus your additionscriticalinput
secretsleaked AWS / OpenAI / Anthropic / GitHub / Stripe / Slack keys, PEM private keys, password assignmentscriticalinput + output
insecure_output_handling<script>, <iframe>, javascript: URIs, CSS injection, markdown XSShighinput + output
piiSSN (with checksum), credit cards (Luhn-validated), emails, phone numbers, passport numbershighinput + output

Compliance & finance

NameCatchesSeverityRuns on
payment_thresholdpayment, refund, transfer, or charge amounts above a configurable capcriticalinput
refund_validationrefunds without a ticket, order, or approval referencehighinput
email_rulesbulk recipients, sends to disposable domains (mailinator, guerrillamail, …)highinput

Quality & behaviour

NameCatchesSeverityRuns on
bulk_dumphigh-entropy output, base64 blobs, repeated n-grams, output near field capshighinput + output
confusion_loopconsecutive tool repeats, A↔B oscillation, single-tool dominance, error-retry stormsmediumtrace
hallucinated_tool_referencetool names not in the known toolset, malformed namesmediuminput
contentprofanity word list and verbatim system-prompt phrase leaksmediuminput + output

Operational

NameCatchesSeverityRuns on
cost_cap_per_tracecumulative trace LLM cost above a configurable capmediumtrace

LLM judges

Model-based. A few hundred ms per call. Use on high-stakes tools.

Security

NameCatchesSeverityRuns on
prompt_injectionattempts to override system instructions, exfiltrate prompts, hijack tools via user inputcriticaltrace
jailbreak_detectionmulti-step bypasses, roleplay tricks, encoding schemes, authority manipulationcriticaltrace
indirect_injectioninjections planted in trace spans, file contents, error messages, API responsescriticaltrace
data_exfiltrationwebhooks to non-allowlisted hosts, raw IPs, DNS exfil, tunneling, MCP token theftcriticaltrace
unauthorized_actiontool calls outside the agent's declared permissions and scopehightrace

Safety

NameCatchesSeverityRuns on
dangerous_operationirreversible filesystem destruction, reverse shells, privilege escalation, fork bombs, supply-chain attackshightrace
toxicitythreats, slurs, harassment, brand-unsafe content in agent outputmediumoutput

Quality

NameCatchesSeverityRuns on
context_degradationloops, self-contradiction, drift, coherence collapse, rediscovery, abandoned hypothesesmediumtrace
goal_deviationdrift from the declared agent goal, scope creepmediumtrace
false_completion"done" claims without trace evidence of the actual workhightrace
wrong_tool_selectionsemantic mismatch between task intent and tool choicemediumtrace
hallucination_in_argsfabricated identifiers, nonsensical parameters, made-up file paths or endpointshighinput
response_hallucinationfabricated facts, citations, identifiers, or claims in agent outputmediumoutput
rag_factual_consistencyanswers not grounded in retrieved context (ragas-style faithfulness)hightrace

Operational

NameCatchesSeverityRuns on
cost_escalationoversized payloads, agent spawning, recursive expansion, exposed LLM endpointsmediuminput

Heal pipeline

These judges are scoped to Staso's built-in self-debugging agent and only fire on heal.* tools.

NameCatchesSeverityRuns on
system_prompt_leakverbatim or paraphrased system-prompt fragments in diagnosis outputhighoutput
heal_out_of_scopecomposer messages unrelated to the current tracelowinput

Default policy

Heal — Staso's built-in self-debugging agent — runs this policy with no setup required:

RuleModeWhy
dangerousenforceirreversible ops
prompt_injectionenforcejailbreak attempts
indirect_injectionenforceinjections in accumulated trace data
system_prompt_leakenforceprevent system-prompt exfil
response_hallucinationauditquality signal, not a block
heal_out_of_scopeauditrelevance check

Custom agents start clean. Pick the rules you want from Settings → Guard and ship them as a policy.

Severity

low, medium, high, critical. Surfaced in result.severity and on every blocked-or-modified span.

Where each rule runs

  • input — checks tool arguments before the call.
  • output — checks the result after the call.
  • input + output — both.
  • trace — operates on accumulated spans, prior violations, and cumulative state.

Next