Concepts
Organization
└── Workspace
└── Trace
└── SpanTrace
One end-to-end agent run. Created automatically when the first span opens. Identified by trace_id.
Span
One unit of work inside a trace — an LLM call, a tool call, a step. Has a kind, status, input, output, and metadata.
| Kind | Meaning |
|---|---|
llm | A call to an LLM provider. |
tool | A tool the agent invoked. |
chain | A composite step. Default for @st.trace. |
retriever | A retrieval or search step. |
agent | An agent entry point. Set by @st.agent. |
custom | Anything else. |
Agent
The logical unit emitting traces — one row in your dashboard. Tag every trace with an agent so you can slice metrics, alerts, and Guard rules per agent.
@st.agent(name="...") overrides the agent for everything inside it. st.init(agent_name=...) is the default for code that isn't decorated.
Conversation
A set of traces that share a conversation_id — the turns of one chat thread, the steps of one multi-turn task, the retries of one job. Use st.conversation(...) to group them.
with st.conversation("conv-42", user_id="user-alice"):
run_agent(query)Environment
A logical partition — prod, staging, dev. Set with environment in st.init(...) or STASO_ENVIRONMENT. Defaults to default.
Workspace
A data isolation unit inside an organization. Set with workspace_slug in st.init(...) or STASO_WORKSPACE_SLUG. See workspaces.
Organization
The billing, RBAC, and ownership boundary. Holds members, API keys, and one or more workspaces. See workspaces.
Eval run
A scored evaluation over a set of traces. One rule × one trace = one verdict. Scopes are trace_id, session_id, dataset_id, agent_id (with sample %), or time_range. See Eval.
Verdict
The output of one rule judging one trace (or one dataset entry) — passed, score, reason, plus the rule's runtime. For agentic verdicts, also inspected — the files the judge read. See Verdicts.
Rule
One rule object, two places it can run. Where it runs is decided when you attach it, not by its type: link it to a Guard policy → it runs in real time, per tool-call; cite it in an eval run → it scores after the fact. Guard and Eval share rule storage, so a rule written for one can serve the other.
A rule's runtime decides what it judges with:
runtime | Judges with | Surface |
|---|---|---|
prompt | an LLM, your instruction | Guard or Eval |
llm_judge | a built-in LLM judge | Guard or Eval |
programmatic | your Python | Guard or Eval |
agentic | a fresh agent that inspects files (Agent Judge) | Eval only |
agentic is eval-only — an agent reading a file bundle can't gate a live tool-call. See rules and policies.
Agent Judge
The agentic runtime. For each agent run in an agent_run dataset, a fresh agent opens the files and judges them against your rubric, after a deterministic precheck gate runs first. Produces a verdict with an inspected tool-call trace. See Agent Judge.
Agent-run dataset
A dataset whose kind is agent_run: each entry is a file bundle (one agent run's outputs — report, findings, evidence) rather than a tabular record. Uploaded in the dashboard, scored by Agent Judge. The other kind, tabular, holds records and is what the SDK creates. See Datasets.