Guard
Everything Guard ships with. Toggle rules and bundle them into policies from Settings → Guard. Each rule is off until added to a policy — except the default heal policy listed at the bottom.
Deterministic, sub-millisecond, no LLM calls.
| Name | Catches | Severity | Runs on |
|---|
dangerous | rm -rf /, dd, fork bombs, fs/process destruction, privilege escalation, supply-chain curl | sh | critical | input |
path_traversal | ../../etc/passwd, encoded traversal, sensitive system paths, null-byte injection | critical | input |
sql_injection | UNION SELECT, xp_cmdshell, OR 1=1, time-based blind, stacked queries | critical | input |
ssrf_detection | metadata endpoints (169.254.169.254), private IPs, encoded IPs, file:// / gopher:// schemes | critical | input |
tool_allowlist | tool calls outside the declared allowlist | critical | input |
tool_denylist | seed denylist (fs.delete, eval, exec, db.drop, shell.exec) plus your additions | critical | input |
secrets | leaked AWS / OpenAI / Anthropic / GitHub / Stripe / Slack keys, PEM private keys, password assignments | critical | input + output |
insecure_output_handling | <script>, <iframe>, javascript: URIs, CSS injection, markdown XSS | high | input + output |
pii | SSN (with checksum), credit cards (Luhn-validated), emails, phone numbers, passport numbers | high | input + output |
| Name | Catches | Severity | Runs on |
|---|
payment_threshold | payment, refund, transfer, or charge amounts above a configurable cap | critical | input |
refund_validation | refunds without a ticket, order, or approval reference | high | input |
email_rules | bulk recipients, sends to disposable domains (mailinator, guerrillamail, …) | high | input |
| Name | Catches | Severity | Runs on |
|---|
bulk_dump | high-entropy output, base64 blobs, repeated n-grams, output near field caps | high | input + output |
confusion_loop | consecutive tool repeats, A↔B oscillation, single-tool dominance, error-retry storms | medium | trace |
hallucinated_tool_reference | tool names not in the known toolset, malformed names | medium | input |
content | profanity word list and verbatim system-prompt phrase leaks | medium | input + output |
| Name | Catches | Severity | Runs on |
|---|
cost_cap_per_trace | cumulative trace LLM cost above a configurable cap | medium | trace |
Model-based. A few hundred ms per call. Use on high-stakes tools.
| Name | Catches | Severity | Runs on |
|---|
prompt_injection | attempts to override system instructions, exfiltrate prompts, hijack tools via user input | critical | trace |
jailbreak_detection | multi-step bypasses, roleplay tricks, encoding schemes, authority manipulation | critical | trace |
indirect_injection | injections planted in trace spans, file contents, error messages, API responses | critical | trace |
data_exfiltration | webhooks to non-allowlisted hosts, raw IPs, DNS exfil, tunneling, MCP token theft | critical | trace |
unauthorized_action | tool calls outside the agent's declared permissions and scope | high | trace |
| Name | Catches | Severity | Runs on |
|---|
dangerous_operation | irreversible filesystem destruction, reverse shells, privilege escalation, fork bombs, supply-chain attacks | high | trace |
toxicity | threats, slurs, harassment, brand-unsafe content in agent output | medium | output |
| Name | Catches | Severity | Runs on |
|---|
context_degradation | loops, self-contradiction, drift, coherence collapse, rediscovery, abandoned hypotheses | medium | trace |
goal_deviation | drift from the declared agent goal, scope creep | medium | trace |
false_completion | "done" claims without trace evidence of the actual work | high | trace |
wrong_tool_selection | semantic mismatch between task intent and tool choice | medium | trace |
hallucination_in_args | fabricated identifiers, nonsensical parameters, made-up file paths or endpoints | high | input |
response_hallucination | fabricated facts, citations, identifiers, or claims in agent output | medium | output |
rag_factual_consistency | answers not grounded in retrieved context (ragas-style faithfulness) | high | trace |
| Name | Catches | Severity | Runs on |
|---|
cost_escalation | oversized payloads, agent spawning, recursive expansion, exposed LLM endpoints | medium | input |
These judges are scoped to Staso's built-in self-debugging agent and only fire on heal.* tools.
| Name | Catches | Severity | Runs on |
|---|
system_prompt_leak | verbatim or paraphrased system-prompt fragments in diagnosis output | high | output |
heal_out_of_scope | composer messages unrelated to the current trace | low | input |
Heal — Staso's built-in self-debugging agent — runs this policy with no setup required:
| Rule | Mode | Why |
|---|
dangerous | enforce | irreversible ops |
prompt_injection | enforce | jailbreak attempts |
indirect_injection | enforce | injections in accumulated trace data |
system_prompt_leak | enforce | prevent system-prompt exfil |
response_hallucination | audit | quality signal, not a block |
heal_out_of_scope | audit | relevance check |
Custom agents start clean. Pick the rules you want from Settings → Guard and ship them as a policy.
low, medium, high, critical. Surfaced in result.severity and on every blocked-or-modified span.
- input — checks tool arguments before the call.
- output — checks the result after the call.
- input + output — both.
- trace — operates on accumulated spans, prior violations, and cumulative state.