Staso Docs
Integrations

Anthropic Integration

Auto-trace every Anthropic API call. No code changes.

Setup

pip install "staso[anthropic]"
import staso as st

st.init(api_key="ak_...", agent_id="my-agent")
st.integrations.patch_anthropic()

Done. Every call made through the Anthropic SDK is now traced automatically.

What Shows Up on the Dashboard

FieldExample
Span nameanthropic.messages.create
Modelclaude-sonnet-4-20250514
Input tokens245
Output tokens89
Total tokens334
Latency1,230 ms
Statusok / error

Example

import anthropic
import staso as st

st.init(api_key="ak_...", agent_id="my-agent")
st.integrations.patch_anthropic()

client = anthropic.Anthropic()

@st.agent(name="chat-agent")
def chat(message: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}],
    )
    return response.content[0].text

with st.conversation("demo"):
    chat("What is observability?")

st.shutdown()

Dashboard:

chat-agent (agent)
└── anthropic.messages.create (llm) — claude-sonnet-4-20250514, 334 tokens, 1.2s

Streaming

Streaming is fully supported — both client.messages.stream() and client.messages.create(stream=True). The SDK wraps the stream to capture tokens and latency as chunks arrive:

@st.agent(name="streaming-agent")
def chat(message: str) -> str:
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}],
    ) as stream:
        return stream.get_final_text()

The span captures the full response, token usage, and time-to-first-token.

Async

Both sync and async clients are instrumented:

client = anthropic.AsyncAnthropic()

@st.agent(name="async-agent")
async def chat(message: str) -> str:
    response = await client.messages.create(...)
    return response.content[0].text

What Gets Captured

Request parameters (stored in span metadata):

  • temperature, max_tokens, top_p, top_k
  • stop_sequences, tool_choice, thinking

Response data:

  • Content text, tool use blocks
  • Stop reason, response ID
  • Token usage (input, output, total, cache read/creation tokens)

Messages are captured by default. To disable, set capture_messages=False in st.init().

Without the Integration

If you want manual control:

@st.trace(name="llm_call", kind="llm")
def call_llm(prompt: str) -> str:
    response = client.messages.create(...)
    return response.content[0].text

You get a span with timing and error tracking, but no automatic token/model extraction.