EvalOps Python SDK: Capture Telemetry in 10 Lines

Why Use the SDK?

While Grimoire CLI wraps your entire process, the EvalOps Python SDK lets you instrument individual functions and capture telemetry inline. This is ideal for:

FastAPI or Flask apps
Jupyter notebooks
Data pipelines with LangChain or LlamaIndex
Any Python code calling LLMs

Installation

pip install evalops

Basic Usage

from evalops import EvalOps

# Initialize (auto-discovers API key from EVALOPS_API_KEY env var)
evalops = EvalOps()

# Or specify manually
evalops = EvalOps(api_key="your-key-here", workspace="your-workspace")

# Capture a trace
with evalops.trace(scenario="customer-support-qa") as trace:
    trace.metadata({"ticket_id": "TKT-5678", "agent": "bot"})

    # Your LLM call
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "How do I reset my password?"}]
    )

    # Log it
    trace.log({
        "provider": "openai",
        "model": "gpt-4",
        "prompt": "How do I reset my password?",
        "response": response.choices[0].message.content,
        "tokens": response.usage.dict()
    })

That's it. The trace is now in your EvalOps workspace, ready for scoring.

Automatic Instrumentation

Prefer zero-config? Use the auto-instrumenter:

from evalops.auto import instrument

# Patch supported libraries
instrument(providers=["openai", "anthropic", "langchain"])

# Now all LLM calls are automatically traced
import openai
response = openai.chat.completions.create(...)  # Captured automatically

This works with:

OpenAI Python SDK
Anthropic Claude SDK
LangChain
LlamaIndex
Haystack

Adding Custom Metrics

Want to score traces with your own logic? Define a custom metric:

from evalops.metrics import custom_metric

@custom_metric(name="contains_apology")
def check_apology(response: str) -> float:
    """Return 1.0 if response contains an apology, else 0.0"""
    apology_words = ["sorry", "apologize", "apologies"]
    return 1.0 if any(word in response.lower() for word in apology_words) else 0.0

# Apply it
with evalops.trace(scenario="support") as trace:
    response = get_llm_response(...)
    trace.log({"response": response})
    trace.score(check_apology(response), metric="contains_apology")

Now every trace has a contains_apology score. View it in EvalOps dashboards or trigger alerts when the rate changes.

Redacting PII

Before traces leave your environment, redact sensitive fields:

evalops = EvalOps(
    redact_fields=["email", "ssn", "credit_card"],
    redact_patterns=[r"\\d{3}-\\d{2}-\\d{4}"]  # SSN regex
)

# These fields are automatically scrubbed before upload

Local-Only Mode

Don't want to send traces to EvalOps yet? Keep them local:

evalops = EvalOps(mode="local", storage_dir="./traces")

# Traces saved to ./traces/*.json

Review them with:

evalops dashboard --local ./traces

FastAPI Integration

For web apps, use the middleware:

from fastapi import FastAPI
from evalops.integrations.fastapi import EvalOpsMiddleware

app = FastAPI()
app.add_middleware(EvalOpsMiddleware, workspace="production")

@app.post("/chat")
async def chat(message: str):
    # Automatically captured as a trace
    response = openai.chat.completions.create(...)
    return {"response": response.choices[0].message.content}

Every request becomes a trace with full context (endpoint, headers, latency).

CI/CD Integration

In your GitHub Actions or GitLab CI:

- name: Run evals with EvalOps
  env:
    EVALOPS_API_KEY: ${{ secrets.EVALOPS_API_KEY }}
    EVALOPS_WORKSPACE: staging
  run: |
    pip install evalops
    python -m evalops exec --scenario "ci-regression" -- pytest tests/eval/

Traces are tagged with commit SHA, branch, and CI metadata.

Next Steps

View the full SDK reference
Import a Spellbook recipe for pre-built scorecards
Set up CI gates to block deploys on regressions

Questions? Open an issue or email hello@evalops.dev.