Pricing
Receipts beat
vibes.
Personal is free. Enterprise is contract-based. Two tiers, no asterisks, cancel anytime.
Start here
Personal
Free
For solo developers and small teams shipping AI agents.
- All horizontal harnesses (grader class)
- Claude Code MCP integration
- Managed endpoint mode
- Signed, shareable reports
- Regression tracking + private share links
- Playground access
- 1 user
- Vertical harnesses (Legal / Healthcare / FSI)
- SSO / SAML
Enterprise
Custom
For regulated industries that need vertical harnesses and audit-grade reports.
- Everything in Personal
- Vertical harnesses — Legal / Healthcare / FSI
- Custom check budget
- SSO / SAML, RBAC, shared workspaces
- Audit log + regulator-format signed reports
- Custom harness authoring with our team
- On-prem worker option
- Dedicated CSM
Grader vs. benchmarkGrader checks are
Grader checks are
free.
Benchmarks are
enterprise.
Grader harnesses (agent hygiene, RAG faithfulness, tool use, etc.) run in seconds against public-dataset fixtures — included on the Personal tier. Benchmark harnesses (WebVoyager, Mind2Web) are long-running agent evaluations with per-customer adapters — contract-based under Enterprise.
Grader harnesses
Free- Agent Hygiene, RAG Faithfulness, Tool Use Stress, etc.
- Runs in seconds
- Backed by public datasets (HarmBench, RAGBench, BFCL)
- Unlimited on the Personal tier
Benchmark harnesses
Enterprise- WebVoyager (643 tasks), Mind2Web (coming soon)
- Runs in hours — real browser automation + LLM judge
- One-time integration, reproducible scores forever
- Contract-based — reach out for pricing
Full comparison.
| Feature | Personal | Enterprise |
|---|---|---|
| Horizontal harnesses | All | All |
| Vertical harnesses | — | Legal / Healthcare / FSI |
| Grader check budget | Unlimited | Custom |
| Benchmark runs | — | Volume pricing |
| Connection modes | MCP + managed endpoint | MCP + managed + on-prem |
| Signed reports | Standard | Regulator-format |
| Playground | ✓ | ✓ |
| Research reports | ✓ | ✓ + custom |
| Seats | 1 | Unlimited |
| SSO / SAML | — | ✓ |
| Support | Dedicated CSM |
Questions
you might have.
What's the difference between grader and benchmark harnesses?+
Grader harnesses are fast, synchronous checks — prompt injection tests, faithfulness checks, tool-use validation. They run in seconds and are available on the Personal tier. Benchmark harnesses (like WebVoyager) are long-running jobs that test browser agents against hundreds of real web tasks. They're Enterprise-only because they require significantly more compute and per-customer integration work — think hours, not seconds.
What's a check?+
One check = one grader execution against one input. A typical harness run executes hundreds to thousands of checks. Personal includes unlimited grader harness runs.
Why is Personal free?+
Personal-tier harnesses are built on public datasets — HarmBench, JailbreakBench, RAGBench, BFCL. The fixtures are free to redistribute, the brand promise is "receipts, not vibes," and we'd rather have every agent developer running real checks against their agent than ours locked behind a paywall. Enterprise is where we charge — vertical harnesses (legal, healthcare, FSI) and long-running benchmarks (WebVoyager, future Mind2Web) require per-customer integration work.
How do I connect my agent?+
Three ways. (1) MCP trace mode: install Pistachio as a Claude Code MCP tool and invoke harnesses from inside any Claude Code session. (2) Managed endpoint: paste your deployed agent's URL + auth header. (3) Playground: paste an OpenAI-compatible endpoint and run checks instantly, no account needed.
What does Enterprise actually unlock?+
Vertical harnesses (legal, healthcare, FSI), benchmark harnesses (WebVoyager, Mind2Web), SSO / RBAC, shared workspaces, audit logging, regulator-format signed reports, custom harness authoring, on-prem worker option, and a dedicated CSM.
