Staso Docs
Datasets

Datasets

Turn production traces into versioned eval datasets, run your agent against them, and score the results.

import staso as st

st.init(workspace_slug="...")

ds = st.dataset.create("refund-edge-cases", description="Tricky refund conversations")
st.dataset.add_entry(ds.id, {"input": "I want a refund for order 42", "expected": "refund_issued"})

def run_agent(entry):
    return my_agent(entry["input"])

def exact_match(entry, output):
    return float(output == entry["expected"])

summary = st.dataset.evaluate(ds.id, run_agent, scorers=[exact_match])
print(summary.passed, "/", summary.total)

Why datasets

A loose tests.csv rots. Staso datasets are versioned, org-scoped, and tied to the real failure patterns in your production traces. Curate from traces you already ship, freeze them as splits, re-run them every prompt or model change.

What you can do

  • Curate from tracesst.dataset.from_traces(...).
  • Import / export CSVupload_csv(...) / download_csv(...).
  • Evaluate with scorers — any Python function over every entry.
  • Generate synthetic datast.dataset.generate(...) to grow an existing dataset.

Next