Staso Docs

Concepts

Organization
└── Workspace
    └── Trace
        └── Span

Trace

One end-to-end agent run. Created automatically when the first span opens. Identified by trace_id.

Span

One unit of work inside a trace — an LLM call, a tool call, a step. Has a kind, status, input, output, and metadata.

KindMeaning
llmA call to an LLM provider.
toolA tool the agent invoked.
chainA composite step. Default for @st.trace.
retrieverA retrieval or search step.
agentAn agent entry point. Set by @st.agent.
customAnything else.

Agent

The logical unit emitting traces — one row in your dashboard. Tag every trace with an agent so you can slice metrics, alerts, and Guard rules per agent.

@st.agent(name="...") overrides the agent for everything inside it. st.init(agent_name=...) is the default for code that isn't decorated.

Conversation

A set of traces that share a conversation_id — the turns of one chat thread, the steps of one multi-turn task, the retries of one job. Use st.conversation(...) to group them.

with st.conversation("conv-42", user_id="user-alice"):
    run_agent(query)

Environment

A logical partition — prod, staging, dev. Set with environment in st.init(...) or STASO_ENVIRONMENT. Defaults to default.

Workspace

A data isolation unit inside an organization. Set with workspace_slug in st.init(...) or STASO_WORKSPACE_SLUG. See workspaces.

Organization

The billing, RBAC, and ownership boundary. Holds members, API keys, and one or more workspaces. See workspaces.

Eval run

A scored evaluation over a set of traces. One rule × one trace = one verdict. Scopes are trace_id, session_id, dataset_id, agent_id (with sample %), or time_range. See Eval.

Verdict

The output of one rule judging one trace (or one dataset entry) — passed, score, reason, plus the rule's runtime. For agentic verdicts, also inspected — the files the judge read. See Verdicts.

Rule

One rule object, two places it can run. Where it runs is decided when you attach it, not by its type: link it to a Guard policy → it runs in real time, per tool-call; cite it in an eval run → it scores after the fact. Guard and Eval share rule storage, so a rule written for one can serve the other.

A rule's runtime decides what it judges with:

runtimeJudges withSurface
promptan LLM, your instructionGuard or Eval
llm_judgea built-in LLM judgeGuard or Eval
programmaticyour PythonGuard or Eval
agentica fresh agent that inspects files (Agent Judge)Eval only

agentic is eval-only — an agent reading a file bundle can't gate a live tool-call. See rules and policies.

Agent Judge

The agentic runtime. For each agent run in an agent_run dataset, a fresh agent opens the files and judges them against your rubric, after a deterministic precheck gate runs first. Produces a verdict with an inspected tool-call trace. See Agent Judge.

Agent-run dataset

A dataset whose kind is agent_run: each entry is a file bundle (one agent run's outputs — report, findings, evidence) rather than a tabular record. Uploaded in the dashboard, scored by Agent Judge. The other kind, tabular, holds records and is what the SDK creates. See Datasets.