← Back to blog

September 15, 2025

EvalOps Python SDK: Capture Telemetry in 10 Lines

pythonsdkintegrations

Why Use the SDK?

While Grimoire CLI wraps your entire process, the EvalOps Python SDK lets you instrument individual functions and capture telemetry inline. This is ideal for:

  • FastAPI or Flask apps
  • Jupyter notebooks
  • Data pipelines with LangChain or LlamaIndex
  • Any Python code calling LLMs

Installation

pip install evalops

Basic Usage

from evalops import EvalOps

# Initialize (auto-discovers API key from EVALOPS_API_KEY env var)
evalops = EvalOps()

# Or specify manually
evalops = EvalOps(api_key="your-key-here", workspace="your-workspace")

# Capture a trace
with evalops.trace(scenario="customer-support-qa") as trace:
    trace.metadata({"ticket_id": "TKT-5678", "agent": "bot"})

    # Your LLM call
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "How do I reset my password?"}]
    )

    # Log it
    trace.log({
        "provider": "openai",
        "model": "gpt-4",
        "prompt": "How do I reset my password?",
        "response": response.choices[0].message.content,
        "tokens": response.usage.dict()
    })

That's it. The trace is now in your EvalOps workspace, ready for scoring.

Automatic Instrumentation

Prefer zero-config? Use the auto-instrumenter:

from evalops.auto import instrument

# Patch supported libraries
instrument(providers=["openai", "anthropic", "langchain"])

# Now all LLM calls are automatically traced
import openai
response = openai.chat.completions.create(...)  # Captured automatically

This works with:

  • OpenAI Python SDK
  • Anthropic Claude SDK
  • LangChain
  • LlamaIndex
  • Haystack

Adding Custom Metrics

Want to score traces with your own logic? Define a custom metric:

from evalops.metrics import custom_metric

@custom_metric(name="contains_apology")
def check_apology(response: str) -> float:
    """Return 1.0 if response contains an apology, else 0.0"""
    apology_words = ["sorry", "apologize", "apologies"]
    return 1.0 if any(word in response.lower() for word in apology_words) else 0.0

# Apply it
with evalops.trace(scenario="support") as trace:
    response = get_llm_response(...)
    trace.log({"response": response})
    trace.score(check_apology(response), metric="contains_apology")

Now every trace has a contains_apology score. View it in EvalOps dashboards or trigger alerts when the rate changes.

Redacting PII

Before traces leave your environment, redact sensitive fields:

evalops = EvalOps(
    redact_fields=["email", "ssn", "credit_card"],
    redact_patterns=[r"\\d{3}-\\d{2}-\\d{4}"]  # SSN regex
)

# These fields are automatically scrubbed before upload

Local-Only Mode

Don't want to send traces to EvalOps yet? Keep them local:

evalops = EvalOps(mode="local", storage_dir="./traces")

# Traces saved to ./traces/*.json

Review them with:

evalops dashboard --local ./traces

FastAPI Integration

For web apps, use the middleware:

from fastapi import FastAPI
from evalops.integrations.fastapi import EvalOpsMiddleware

app = FastAPI()
app.add_middleware(EvalOpsMiddleware, workspace="production")

@app.post("/chat")
async def chat(message: str):
    # Automatically captured as a trace
    response = openai.chat.completions.create(...)
    return {"response": response.choices[0].message.content}

Every request becomes a trace with full context (endpoint, headers, latency).

CI/CD Integration

In your GitHub Actions or GitLab CI:

- name: Run evals with EvalOps
  env:
    EVALOPS_API_KEY: ${{ secrets.EVALOPS_API_KEY }}
    EVALOPS_WORKSPACE: staging
  run: |
    pip install evalops
    python -m evalops exec --scenario "ci-regression" -- pytest tests/eval/

Traces are tagged with commit SHA, branch, and CI metadata.

Next Steps

Questions? Open an issue or email hello@evalops.dev.