Changelog

What’s new in EvalOps

Release notes, platform improvements, and new content announcements.

October 3, 2025

Version 2025.10.03

EvalOps Agent Launch & Telemetry Upgrades

  • Released the EvalOps Agent to orchestrate evaluation suites, scorecards, and release gates
  • Shipped first-class telemetry connectors for Slack, GitHub, and PagerDuty so incidents stay tied to evals
  • Introduced evaluation dataset versioning and shadow-run diffing for safer prompt/weight changes
  • Added governance attestations API to capture review sign-off alongside evaluation evidence

EvalOps Agent

  • Agent now schedules capture → evaluate → decide loops automatically for every workspace
  • Launch-ready gate policies block deploys when safety or quality thresholds slip
  • Slack digests summarize failing evals with direct links back to scorecards

Telemetry

  • New GitHub App ingests pull-request context so evaluation results map to diffs
  • PagerDuty integration posts paging events into eval timelines for root-cause correlation
  • Expanded Spellbook recipes with dataset version pinning and replay controls

Governance

  • Attestations API stores approver, policy, and evidence payload for each release gate
  • Audit log now captures evaluator overrides with before/after snapshots
  • Trust Center widgets display live control coverage, driven directly from eval outcomes

Developer Experience

  • CLI gained eval-agent commands to kick off suites locally and stream results