Know which change actually improved your agent
You shipped a new prompt, swapped the model, and updated your tools in one release. The eval score went up. But which change helped? Isolate runs controlled experiments to find out.
The problem
Eval scores tell you what changed,
not why
Every release bundles multiple changes together. A new prompt, a model swap, updated tools, better retrieval. When the score moves, nobody can say which change caused it.
Prompt, model, tools, retrieval, routing — teams ship them all at once.
Your dashboard shows one number. It can't distinguish what helped from what hurt.
As agents grow, manual testing becomes impossible. You need systematic ablation.
How it works
One command.
Complete attribution.
Point it at your agent
Isolate reads your agent config — prompt, model, tools, retrieval pipeline. It snapshots the current state as a baseline.
isolate init --config agent.yamlWe run the experiments
Variants are generated automatically by swapping out individual components. Each variant runs against your eval suite in parallel — no manual test matrix.
isolate run --variants auto --eval accuracyYou get a clear answer
A report shows exactly which changes improved performance, which degraded it, and which had no effect. With statistical significance, not vibes.
isolate report --format tableBefore & after
From “it went up” to knowing exactly why
Use cases
For any team shipping agents to production
Customer support agents
You upgraded the model and rewrote the prompt. Resolution rate is up 15%. Was it the prompt or the model? Isolate tells you the prompt did the heavy lifting — the model swap actually hurt edge cases.
Coding assistants
New retrieval pipeline, updated instructions, context window change. Pass rate jumped. Isolate shows retrieval was the only change that mattered.
RAG pipelines
Chunk size, embedding model, reranker — all changed at once. Answer quality improved, but which component? Isolate ablates each one independently.
Multi-agent systems
When orchestrators delegate to sub-agents, a change in one agent can mask regressions in another. Isolate tests each agent in isolation.
Get early access
We're onboarding design partners for our private beta. Join the waitlist or book a call to discuss your eval workflow.