The trust layer
for AI agents.AI agents.
Pre-built harnesses for the scenarios you'll actually hit, built with real domain experts. The guardrails you wish your agent had — but never did.
$ pistachio harness add agent-hygiene
✔ Installed as Claude Code MCP tool
✔ 350 checks ready
$ pistachio harness run agent-hygiene
Running 350 / 350 … ▉
Report → pistachio.sh/r/9fZ3k
Three commands.
Zero config.
We handle the hard parts — graders, fixtures, sandboxing. You focus on the agent.
Pick a harness
Browse curated harnesses, from agent hygiene to RAG faithfulness. Filter by tier, model, and failure mode.
Pipe it into Claude Code
One CLI command installs the harness as an MCP tool. Zero config. Runs against the models you already use.
Ship with receipts
Get a signed eval report you can drop into a PR, a launch doc, or a sales deck. No hand-waving.
Curated. Battle-tested.
Opinionated.
RAG Faithfulness
Catch hallucinations before your users do.
Legal Citation Verification
Your AI hallucinates cases. We catch it.
WebVoyager
Score your browser agent against the updated WebVoyager corpus (2026 version).
From people who
already pushed agents to prod.
“Caught a regression we’d never have found otherwise.”
“Pistachio is our audit trail engine now.”
“Replaced our homegrown eval scripts in a week. Now it just runs in CI.”
The trust layer between
your agent and the real world
Pistachio runs the harnesses your agent should pass before it sees a real user. Same harnesses, same receipts, whether you're a solo dev shipping your first agent or an enterprise team managing fleets of them.
