Pistachio

The trust layer
for AI agents.AI agents.

Pre-built harnesses for the scenarios you'll actually hit, built with real domain experts. The guardrails you wish your agent had — but never did.

Live on CLI
~ / claude-code

$ pistachio harness add agent-hygiene

✔ Installed as Claude Code MCP tool

✔ 350 checks ready

$ pistachio harness run agent-hygiene

Running 350 / 350 …

Pass rate96.3%
13 failuressigned

Report → pistachio.sh/r/9fZ3k

100K+
Eval runs shipped
12
Curated harnesses
18
Failure modes detected
42s
Avg setup time
How it works

Three commands.
Zero config.

We handle the hard parts — graders, fixtures, sandboxing. You focus on the agent.

01

Pick a harness

Browse curated harnesses, from agent hygiene to RAG faithfulness. Filter by tier, model, and failure mode.

02

Pipe it into Claude Code

One CLI command installs the harness as an MCP tool. Zero config. Runs against the models you already use.

03

Ship with receipts

Get a signed eval report you can drop into a PR, a launch doc, or a sales deck. No hand-waving.

Receipts

From people who
already pushed agents to prod.

Caught a regression we’d never have found otherwise.

Maya R.
Eng lead, agent team

Pistachio is our audit trail engine now.

Theo C.
Eng lead, retail bank

Replaced our homegrown eval scripts in a week. Now it just runs in CI.

Iris N.
Platform lead, applied AI

The trust layer between
your agent and the real world

Pistachio runs the harnesses your agent should pass before it sees a real user. Same harnesses, same receipts, whether you're a solo dev shipping your first agent or an enterprise team managing fleets of them.