Catch agent regressions before your users do.
Teams ship AI agents with no way to know that a prompt tweak or model bump quietly broke behavior. The agent still answers, it just answers worse, or calls a tool it should not, and they find out in production. Tracecase is the test layer that catches it first.
Records every run
Inputs, outputs, tool calls, latency and pass/fail for each case in a suite.
Diffs vs the last run
Every run is compared to the previous one to find exactly what changed.
Flags regressions
Cases that newly fail, hallucinate, or make an unsafe tool call are surfaced.
Gates the merge
Returns shouldFail so your CI blocks a bad prompt or model change.
How it is built
Next.js on Cloudflare Workers, a Supabase Postgres backend, and a token-protected ingest API. An n8n workflow posts a fresh run every hour so the dashboard is always live, and a built-in AI assistant explains any run in plain English.