Anthropic Integration

Auto-trace every Anthropic API call. No code changes.

Setup

pip install "staso[anthropic]"

import staso as st

st.init(api_key="ak_...", agent_id="my-agent")
st.integrations.patch_anthropic()

Done. Every call made through the Anthropic SDK is now traced automatically.

What Shows Up on the Dashboard

Field	Example
Span name	`anthropic.messages.create`
Model	`claude-sonnet-4-20250514`
Input tokens	`245`
Output tokens	`89`
Total tokens	`334`
Latency	`1,230 ms`
Status	`ok` / `error`

Example

import anthropic
import staso as st

st.init(api_key="ak_...", agent_id="my-agent")
st.integrations.patch_anthropic()

client = anthropic.Anthropic()

@st.agent(name="chat-agent")
def chat(message: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}],
    )
    return response.content[0].text

with st.conversation("demo"):
    chat("What is observability?")

st.shutdown()

Dashboard:

chat-agent (agent)
└── anthropic.messages.create (llm) — claude-sonnet-4-20250514, 334 tokens, 1.2s

Streaming

Streaming is fully supported — both client.messages.stream() and client.messages.create(stream=True). The SDK wraps the stream to capture tokens and latency as chunks arrive:

@st.agent(name="streaming-agent")
def chat(message: str) -> str:
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}],
    ) as stream:
        return stream.get_final_text()

The span captures the full response, token usage, and time-to-first-token.

Async

Both sync and async clients are instrumented:

client = anthropic.AsyncAnthropic()

@st.agent(name="async-agent")
async def chat(message: str) -> str:
    response = await client.messages.create(...)
    return response.content[0].text

What Gets Captured

Request parameters (stored in span metadata):

temperature, max_tokens, top_p, top_k
stop_sequences, tool_choice, thinking

Response data:

Content text, tool use blocks
Stop reason, response ID
Token usage (input, output, total, cache read/creation tokens)

Messages are captured by default. To disable, set capture_messages=False in st.init().

Without the Integration

If you want manual control:

@st.trace(name="llm_call", kind="llm")
def call_llm(prompt: str) -> str:
    response = client.messages.create(...)
    return response.content[0].text

You get a span with timing and error tracking, but no automatic token/model extraction.

On this page