Staso Docs
Datasets

Curate Datasets from Traces

Build an eval dataset directly from real production traces with st.dataset.from_traces(...).

import staso as st
from staso.dataset import TraceColumnMapping

st.init(api_key="...", workspace_slug="...")

ds = st.dataset.from_traces(
    name="refund-failures-may",
    trace_ids=["trace_abc", "trace_def", "trace_ghi"],
    mapping=[
        TraceColumnMapping(source="root_span.input.query", target="input"),
        TraceColumnMapping(source="root_span.output.result", target="expected"),
    ],
    description="Real failed refund turns from production",
)

Mapping

mapping is a list of TraceColumnMapping(source, target) objects.

  • source — dotted path into the trace (for example root_span.input.query or request.body).
  • target — the dataset column name to write that value into.
from staso.dataset import TraceColumnMapping

mapping = [
    TraceColumnMapping(source="request.body",   target="input"),
    TraceColumnMapping(source="response.body",  target="output"),
    TraceColumnMapping(source="metadata.user",  target="user"),
]

A bare dict[str, str] is still accepted for backwards compatibility but is ambiguous and emits a DeprecationWarning. Always prefer the typed form.

Workflow

The dataset-from-traces loop is the fastest way to turn a production bug into a regression test.

  1. Find failing traces in the Staso dashboard (filter by status, latency, or guard verdict).
  2. Copy the trace IDs you want to freeze as eval cases.
  3. Call st.dataset.from_traces(...) with a TraceColumnMapping that pulls the fields you care about.
  4. Attach scorers and run st.dataset.evaluate(...) against the new dataset every time you ship a fix.

Entries created this way have source_type set on the DatasetEntry, so you can always trace a row back to its originating production span.

Next