October 3, 2025
Version 2025.10.03EvalOps Agent Launch & Telemetry Upgrades
- ✦Released the EvalOps Agent to orchestrate evaluation suites, scorecards, and release gates
- ✦Shipped first-class telemetry connectors for Slack, GitHub, and PagerDuty so incidents stay tied to evals
- ✦Introduced evaluation dataset versioning and shadow-run diffing for safer prompt/weight changes
- ✦Added governance attestations API to capture review sign-off alongside evaluation evidence
EvalOps Agent
- •Agent now schedules capture → evaluate → decide loops automatically for every workspace
- •Launch-ready gate policies block deploys when safety or quality thresholds slip
- •Slack digests summarize failing evals with direct links back to scorecards
Telemetry
- •New GitHub App ingests pull-request context so evaluation results map to diffs
- •PagerDuty integration posts paging events into eval timelines for root-cause correlation
- •Expanded Spellbook recipes with dataset version pinning and replay controls
Governance
- •Attestations API stores approver, policy, and evidence payload for each release gate
- •Audit log now captures evaluator overrides with before/after snapshots
- •Trust Center widgets display live control coverage, driven directly from eval outcomes
Developer Experience
- •CLI gained eval-agent commands to kick off suites locally and stream results