Staso Docs
Datasets

Manage datasets

Every operation is a top-level function on staso.dataset.

ds = st.dataset.create(
    name="eval-refunds",
    description="Refund-flow regressions",
    columns=[
        {"name": "input", "column_type": "input"},
        {"name": "expected", "column_type": "expected_output"},
    ],
)
st.dataset.add_entry(ds.id, {"input": "...", "expected": "refund_issued"}, split="test")

Datasets

ds = st.dataset.create(name=..., description=..., columns=[...], folder_id=None)
ds = st.dataset.get(dataset_id)
datasets = st.dataset.list(folder_id=None, limit=100, offset=0)
ds = st.dataset.update(dataset_id, name=..., description=...)
st.dataset.delete(dataset_id)

Dataset fields: id, name, description, folder_id, columns, entry_count, created_at, updated_at.

Entries

entry  = st.dataset.add_entry(ds.id, {...}, split="train")
entries = st.dataset.add_entries(ds.id, [{...}, {...}], split="test")
rows   = st.dataset.list_entries(ds.id, split=None, limit=100, offset=0)
st.dataset.update_entry(ds.id, entry.id, data={...})
st.dataset.delete_entry(ds.id, entry.id)

DatasetEntry fields: id, dataset_id, data, split, source_type, created_at. source_type is set automatically when the entry came from a trace — see curate from traces.

Folders

folder = st.dataset.create_folder("evals/refunds")
folders = st.dataset.list_folders(parent_id=None)
st.dataset.delete_folder(folder.id)

Column types

ValuePurpose
variableFree-form, default.
inputAgent input.
outputAgent output.
expected_outputGround truth for scoring.
expected_tool_callsGround-truth tool trajectory.
scenarioScenario description.
expected_stepsGround-truth step list.
conversation_historyPrior turns.
filesAttached files.
customEscape hatch.
from staso.dataset import ColumnType
ColumnType.INPUT            # "input"
ColumnType.EXPECTED_OUTPUT  # "expected_output"

Splits

SplitType.TRAIN / TEST / VALIDATION. Strings ("train", "test", "validation") are accepted everywhere a SplitType is.

Next