Pricing

Receipts beat
vibes.

Personal is free. Enterprise is contract-based. Two tiers, no asterisks, cancel anytime.

Start here

Personal

Free

For solo developers and small teams shipping AI agents.

All horizontal harnesses (grader class)
Claude Code MCP integration
Managed endpoint mode
Signed, shareable reports
Regression tracking + private share links
Playground access
1 user
Vertical harnesses (Legal / Healthcare / FSI)
SSO / SAML

Start with Personal

Enterprise

Custom

For regulated industries that need vertical harnesses and audit-grade reports.

Everything in Personal
Vertical harnesses — Legal / Healthcare / FSI
Custom check budget
SSO / SAML, RBAC, shared workspaces
Audit log + regulator-format signed reports
Custom harness authoring with our team
On-prem worker option
Dedicated CSM

Contact sales

Grader vs. benchmark

Grader checks are
free.
Benchmarks are
enterprise.

Grader harnesses (agent hygiene, RAG faithfulness, tool use, etc.) run in seconds against public-dataset fixtures — included on the Personal tier. Benchmark harnesses (WebVoyager, Mind2Web) are long-running agent evaluations with per-customer adapters — contract-based under Enterprise.

Grader harnesses

Free

Agent Hygiene, RAG Faithfulness, Tool Use Stress, etc.
Runs in seconds
Backed by public datasets (HarmBench, RAGBench, BFCL)
Unlimited on the Personal tier

Benchmark harnesses

Enterprise

WebVoyager (643 tasks), Mind2Web (coming soon)
Runs in hours — real browser automation + LLM judge
One-time integration, reproducible scores forever
Contract-based — reach out for pricing

Contact sales

Full comparison.

Feature	Personal	Enterprise
Horizontal harnesses	All	All
Vertical harnesses	—	Legal / Healthcare / FSI
Grader check budget	Unlimited	Custom
Benchmark runs	—	Volume pricing
Connection modes	MCP + managed endpoint	MCP + managed + on-prem
Signed reports	Standard	Regulator-format
Playground	✓	✓
Research reports	✓	✓ + custom
Seats	1	Unlimited
SSO / SAML	—	✓
Support	Email	Dedicated CSM

Questions
you might have.

What's the difference between grader and benchmark harnesses?+

Grader harnesses are fast, synchronous checks — prompt injection tests, faithfulness checks, tool-use validation. They run in seconds and are available on the Personal tier. Benchmark harnesses (like WebVoyager) are long-running jobs that test browser agents against hundreds of real web tasks. They're Enterprise-only because they require significantly more compute and per-customer integration work — think hours, not seconds.

What's a check?+

One check = one grader execution against one input. A typical harness run executes hundreds to thousands of checks. Personal includes unlimited grader harness runs.

Why is Personal free?+

Personal-tier harnesses are built on public datasets — HarmBench, JailbreakBench, RAGBench, BFCL. The fixtures are free to redistribute, the brand promise is "receipts, not vibes," and we'd rather have every agent developer running real checks against their agent than ours locked behind a paywall. Enterprise is where we charge — vertical harnesses (legal, healthcare, FSI) and long-running benchmarks (WebVoyager, future Mind2Web) require per-customer integration work.

How do I connect my agent?+

Three ways. (1) MCP trace mode: install Pistachio as a Claude Code MCP tool and invoke harnesses from inside any Claude Code session. (2) Managed endpoint: paste your deployed agent's URL + auth header. (3) Playground: paste an OpenAI-compatible endpoint and run checks instantly, no account needed.

What does Enterprise actually unlock?+

Vertical harnesses (legal, healthcare, FSI), benchmark harnesses (WebVoyager, Mind2Web), SSO / RBAC, shared workspaces, audit logging, regulator-format signed reports, custom harness authoring, on-prem worker option, and a dedicated CSM.

Receipts beatvibes.