Pistachio
Pricing

Receipts beat
vibes.

Personal is free. Enterprise is contract-based. Two tiers, no asterisks, cancel anytime.

Start here
Personal
Free

For solo developers and small teams shipping AI agents.

  • All horizontal harnesses (grader class)
  • Claude Code MCP integration
  • Managed endpoint mode
  • Signed, shareable reports
  • Regression tracking + private share links
  • Playground access
  • 1 user
  • Vertical harnesses (Legal / Healthcare / FSI)
  • SSO / SAML
Start with Personal
Enterprise
Custom

For regulated industries that need vertical harnesses and audit-grade reports.

  • Everything in Personal
  • Vertical harnesses — Legal / Healthcare / FSI
  • Custom check budget
  • SSO / SAML, RBAC, shared workspaces
  • Audit log + regulator-format signed reports
  • Custom harness authoring with our team
  • On-prem worker option
  • Dedicated CSM
Contact sales
Grader vs. benchmark

Grader checks are
free.
Benchmarks are
enterprise.

Grader harnesses (agent hygiene, RAG faithfulness, tool use, etc.) run in seconds against public-dataset fixtures — included on the Personal tier. Benchmark harnesses (WebVoyager, Mind2Web) are long-running agent evaluations with per-customer adapters — contract-based under Enterprise.

Grader harnesses

Free
  • Agent Hygiene, RAG Faithfulness, Tool Use Stress, etc.
  • Runs in seconds
  • Backed by public datasets (HarmBench, RAGBench, BFCL)
  • Unlimited on the Personal tier

Benchmark harnesses

Enterprise
  • WebVoyager (643 tasks), Mind2Web (coming soon)
  • Runs in hours — real browser automation + LLM judge
  • One-time integration, reproducible scores forever
  • Contract-based — reach out for pricing

Full comparison.

FeaturePersonalEnterprise
Horizontal harnessesAllAll
Vertical harnessesLegal / Healthcare / FSI
Grader check budgetUnlimitedCustom
Benchmark runsVolume pricing
Connection modesMCP + managed endpointMCP + managed + on-prem
Signed reportsStandardRegulator-format
Playground
Research reports✓ + custom
Seats1Unlimited
SSO / SAML
SupportEmailDedicated CSM

Questions
you might have.

What's the difference between grader and benchmark harnesses?+
Grader harnesses are fast, synchronous checks — prompt injection tests, faithfulness checks, tool-use validation. They run in seconds and are available on the Personal tier. Benchmark harnesses (like WebVoyager) are long-running jobs that test browser agents against hundreds of real web tasks. They're Enterprise-only because they require significantly more compute and per-customer integration work — think hours, not seconds.
What's a check?+
One check = one grader execution against one input. A typical harness run executes hundreds to thousands of checks. Personal includes unlimited grader harness runs.
Why is Personal free?+
Personal-tier harnesses are built on public datasets — HarmBench, JailbreakBench, RAGBench, BFCL. The fixtures are free to redistribute, the brand promise is "receipts, not vibes," and we'd rather have every agent developer running real checks against their agent than ours locked behind a paywall. Enterprise is where we charge — vertical harnesses (legal, healthcare, FSI) and long-running benchmarks (WebVoyager, future Mind2Web) require per-customer integration work.
How do I connect my agent?+
Three ways. (1) MCP trace mode: install Pistachio as a Claude Code MCP tool and invoke harnesses from inside any Claude Code session. (2) Managed endpoint: paste your deployed agent's URL + auth header. (3) Playground: paste an OpenAI-compatible endpoint and run checks instantly, no account needed.
What does Enterprise actually unlock?+
Vertical harnesses (legal, healthcare, FSI), benchmark harnesses (WebVoyager, Mind2Web), SSO / RBAC, shared workspaces, audit logging, regulator-format signed reports, custom harness authoring, on-prem worker option, and a dedicated CSM.