OpenAI Integration

Auto-trace every OpenAI chat completion call. No code changes.

Setup

pip install "staso[openai]"

import staso as st

st.init(api_key="ak_...", agent_id="my-agent")
st.integrations.patch_openai()

Done. Every call made through the OpenAI SDK is now traced automatically.

What Shows Up on the Dashboard

Field	Example
Span name	`openai.chat.completions.create`
Model	`gpt-4o`
Input tokens	`245`
Output tokens	`89`
Total tokens	`334`
Latency	`1,230 ms`
Status	`ok` / `error`

Example

from openai import OpenAI
import staso as st

st.init(api_key="ak_...", agent_id="my-agent")
st.integrations.patch_openai()

client = OpenAI()

@st.agent(name="chat-agent")
def chat(message: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}],
    )
    return response.choices[0].message.content

with st.conversation("demo"):
    chat("What is observability?")

st.shutdown()

Dashboard:

chat-agent (agent)
└── openai.chat.completions.create (llm) — gpt-4o, 334 tokens, 1.2s

Streaming

Streaming is fully supported. The SDK wraps the stream to capture tokens and latency as chunks arrive:

@st.agent(name="streaming-agent")
def chat(message: str) -> str:
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": message}],
        stream=True,
    )
    chunks = []
    for chunk in stream:
        if chunk.choices[0].delta.content:
            chunks.append(chunk.choices[0].delta.content)
    return "".join(chunks)

The span captures the full response, token usage, and time-to-first-token.

Async

Both sync and async clients are instrumented:

from openai import AsyncOpenAI

client = AsyncOpenAI()

@st.agent(name="async-agent")
async def chat(message: str) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": message}],
    )
    return response.choices[0].message.content

What Gets Captured

Request parameters (stored in span metadata):

temperature, max_tokens, top_p
frequency_penalty, presence_penalty, seed
tool_choice, response_format, service_tier

Response data:

Content text, role, tool calls
Finish reason
Token usage (input, output, total, reasoning tokens)

Messages are captured by default. To disable, set capture_messages=False in st.init().

Without the Integration

If you want manual control:

@st.trace(name="llm_call", kind="llm")
def call_llm(prompt: str) -> str:
    response = client.chat.completions.create(...)
    return response.choices[0].message.content

You get a span with timing and error tracking, but no automatic token/model extraction.

On this page