Tracecase
about

Catch agent regressions before your users do.

Teams ship AI agents with no way to know that a prompt tweak or model bump quietly broke behavior. The agent still answers, it just answers worse, or calls a tool it should not, and they find out in production. Tracecase is the test layer that catches it first.

Records every run

Inputs, outputs, tool calls, latency and pass/fail for each case in a suite.

Diffs vs the last run

Every run is compared to the previous one to find exactly what changed.

Flags regressions

Cases that newly fail, hallucinate, or make an unsafe tool call are surfaced.

Gates the merge

Returns shouldFail so your CI blocks a bad prompt or model change.

How it is built

Next.js on Cloudflare Workers, a Supabase Postgres backend, and a token-protected ingest API. An n8n workflow posts a fresh run every hour so the dashboard is always live, and a built-in AI assistant explains any run in plain English.

Next.jsCloudflare WorkersSupabasen8nOllama