Datasets
Curate Datasets from Traces
Build an eval dataset directly from real production traces with st.dataset.from_traces(...).
import staso as st
from staso.dataset import TraceColumnMapping
st.init(api_key="...", workspace_slug="...")
ds = st.dataset.from_traces(
name="refund-failures-may",
trace_ids=["trace_abc", "trace_def", "trace_ghi"],
mapping=[
TraceColumnMapping(source="root_span.input.query", target="input"),
TraceColumnMapping(source="root_span.output.result", target="expected"),
],
description="Real failed refund turns from production",
)Mapping
mapping is a list of TraceColumnMapping(source, target) objects.
source— dotted path into the trace (for exampleroot_span.input.queryorrequest.body).target— the dataset column name to write that value into.
from staso.dataset import TraceColumnMapping
mapping = [
TraceColumnMapping(source="request.body", target="input"),
TraceColumnMapping(source="response.body", target="output"),
TraceColumnMapping(source="metadata.user", target="user"),
]A bare dict[str, str] is still accepted for backwards compatibility but is ambiguous and emits a DeprecationWarning. Always prefer the typed form.
Workflow
The dataset-from-traces loop is the fastest way to turn a production bug into a regression test.
- Find failing traces in the Staso dashboard (filter by status, latency, or guard verdict).
- Copy the trace IDs you want to freeze as eval cases.
- Call
st.dataset.from_traces(...)with aTraceColumnMappingthat pulls the fields you care about. - Attach scorers and run
st.dataset.evaluate(...)against the new dataset every time you ship a fix.
Entries created this way have source_type set on the DatasetEntry, so you can always trace a row back to its originating production span.