Pistachio
Now in private beta

The trust layer
for AI agents.AI agents.

Pre-built harnesses for the scenarios you'll actually hit, built with real domain experts. The guardrails you wish your agent had — but never did.

Now in Claude Code

Live on CLI
~ / claude-code

$ pistachio harness add agent-hygiene

✔ Installed as Claude Code MCP tool

✔ 350 checks ready

$ pistachio harness run agent-hygiene

Running 350 / 350 …

Pass rate96.3%
13 failuressigned

Report → pistachio.sh/r/9fZ3k

2.4M
Eval runs shipped
120+
Curated harnesses
18
Models covered
42s
Avg setup time
How it works

Three commands.
Zero config.

We handle the hard parts — graders, fixtures, sandboxing. You focus on the agent.

01

Pick a harness

Browse curated harnesses, from agent hygiene to RAG faithfulness. Filter by tier, model, and failure mode.

02

Pipe it into Claude Code

One CLI command installs the harness as an MCP tool. Zero config. Runs against the models you already use.

03

Ship with receipts

Get a signed eval report you can drop into a PR, a launch doc, or a sales deck. No hand-waving.

Receipts

From people who
already pushed agents to prod.

We stopped shipping agents on vibes the day we plugged Pistachio into our CI. Pre-deploy harness runs catch stuff our staging never did.

Ava K.
Staff eng, payments agent team

The RAG faithfulness harness alone pays for our subscription. We found three silent regressions in week one.

Ren M.
Founding eng, legal-AI startup

Finally — evals that I actually run. The Claude Code integration is the whole pitch. It's where my work already lives.

Dmitri L.
Applied AI, fintech

The trust layer between
your agent and the real world

Pistachio runs the harnesses your agent should pass before it sees a real user. Same harnesses, same receipts, whether you're a solo dev shipping your first agent or an enterprise team managing fleets of them.