Staso Docs
Datasets

Datasets Overview

Turn production traces into versioned eval datasets, run your agent against them, and score the results.

import staso as st

st.init(api_key="...", workspace_slug="...")

ds = st.dataset.create("refund-edge-cases", description="Tricky refund conversations")
st.dataset.add_entry(ds.id, {"input": "I want a refund for order 42", "expected": "refund_issued"})

def run_agent(entry):
    return my_agent(entry["input"])

def exact_match(entry, output):
    return float(output == entry["expected"])

summary = st.dataset.evaluate(ds.id, run_agent, scorers=[exact_match])
print(summary.passed, "/", summary.total)

Why datasets

Eval datasets stop your test harness from rotting. Instead of a loose tests.csv that nobody updates, Staso datasets are a versioned, org-scoped source of truth tied to the real failure patterns in your production traces. You curate directly from the traces you already ship, freeze them as test splits, and re-run them every time you change a prompt or a model.

What you can do

  • Curate from traces — build a dataset from real trace IDs with from_traces(...).
  • Import and export CSVupload_csv(...) and download_csv(...) for round-tripping with spreadsheets or git.
  • Evaluate with scorers — run any Python function over every entry and score the output.
  • Generate synthetic data — grow an existing dataset with generate(...).

Plan limits

Datasets are not available on the free (no_plan) tier. Upgrade to unlock.

PlanDatasets / orgEntries / datasetColumns / dataset
Personal350020
Team3010,00050
EnterpriseUnlimitedUnlimitedUnlimited

See /docs/pricing for full details.

Next