The loop

Four products. One loop. Compounding from day one.

One guard violation, one click, becomes an eval rule that scores every future run. A failing verdict opens self-heal at the failing commit. Same schema, same authoring surface, same lineage — single-product competitors can't catch the compounding.

Start the loop
01 · THE FOUR ARROWS

Trace → guard → eval → self-heal.

Four arrows, one canonical direction. Every arrow is one click on the dashboard, one method on the SDK. Nothing custom to wire, nothing to remember.

01trace

Run lands

Spans, tool calls, and conversation, captured automatically by the SDK or CLI.

02guard

Block in real time

Static and LLM judges enforce at the tool boundary, in under 50ms, before a bad action ships.

03eval

Score after the fact

Async judges read each run and return a pass, a fail, a score, and the reason.

04self-heal

Diagnose root cause

A sandbox replays the failing run and returns a root-cause diagnosis.

02 · WHAT EACH PRODUCT FEEDS

The receipts.

The arrows aren't marketing. Every product produces data another product reads — same schema, same auth, same UI patterns.

01 · TRACE

Every run, every span, every tool call.

Feeds: guard inputs, eval scope, heal evidence.

Spans, tool calls, conversation, artifacts. One SDK call per agent, or one CLI install for Claude Code and Codex. Nothing else to wire.

02 · GUARD

Block at the tool boundary.

Feeds: eval rule candidates, heal triggers.

Static + LLM judges + your custom rules — sub-50ms, before the tool fires. A blocked call writes a guard_violation; the rule that fired is one click from being ported to eval.

03 · EVAL

Score after the fact.

Feeds: guard rule promotions, drift detection, heal targets.

Async judges read the trace, the artifacts, the conversation. A failing verdict is one click from a guard rule (now blocking real time) and one click from a heal run (now diagnosing root cause).

04 · SELF-HEAL

Diagnose root cause. Open the PR.

Feeds: learned patterns, future rule candidates.

Sandbox runs read your repo at the failing commit, plus pre-loaded eval and guard history. Output is a root cause and a fix PR — your CI stays in charge of merge.

03 · WHY IT COMPOUNDS

Three structural reasons. Not promises.

ONE

Same data plane.

Spans, rules, verdicts, violations, artifacts — one schema, one auth boundary. Nothing to ETL. Nothing to keep in sync.

TWO

Same authoring surface.

Write a rule once. Run it as guard (real-time block), as eval (offline judge), or as both. Promote either direction without re-implementing the logic.

THREE

Same lineage everywhere.

Every rule shows where it came from (eval verdict, guard violation, manual). Every verdict links back to the failing trace. Every fix PR cites the violation it closed.

04 · WHAT TO EXPECT

Your first three weeks.

The loop card on your dashboard tracks the same four arrows in live numbers — traces, verdicts, diagnoses, blocks — for your own org. No demo data, no canned screenshots.

Week 1

Trace + first verdicts

Install the SDK, attach a zero-config eval, see verdicts arrive within minutes of your first agent run.

Week 2

First promoted rule

Pick an eval that consistently catches a real failure, promote the rule to guard, watch it block in real time.

Week 3+

Compounding

Every blocked tool call writes a violation, every violation is one click from a sharper rule, the loop tightens itself.

05 · See it run

One install. The whole loop.

Wire the SDK in a minute. Attach a rule. Verdicts arrive. Promote one to guard. Watch the blocks ship. The compounding starts on day one — not after a quarter of integration.

Start the loop