Pistachio
LivePublic demo runner

Test your agent.
Right now.

Paste an OpenAI-compatible endpoint, pick a harness, and watch checks run live against your agent. No sign-up, no API key, no commitment.

Your endpoint
OpenAI-compatible chat completions API
Required format: Your endpoint must accept OpenAI-compatible chat completions requests (POST with messages array, returns choices[0].message.content). Works with OpenAI, Anthropic (via proxy), Together, Groq, Ollama, vLLM, and most agent frameworks.

Credentials are saved in your browser only. We don't store them.

Pick a harness

How it works

  1. 1Enter your OpenAI-compatible endpoint URL
  2. 2Pick a harness to run against it
  3. 3We send real test prompts and check the responses
  4. 4Results stream in live — pass, fail, or error per check

Endpoint format

We send a standard chat completions request:

POST your-url
{
  "model": "your-model",
  "messages": [
    { "role": "user",
      "content": "..." }
  ]
}

And expect:

{
  "choices": [{
    "message": {
      "content": "..."
    }
  }]
}

Endpoint not compatible?

If your agent doesn't speak OpenAI format, or you'd rather have us run the benchmark for you — we'd love to help. We support custom integrations for teams evaluating browser agents, legal AI, and more.

Want the full harness?

The playground runs a sample of checks. The full harness includes hundreds more, a signed report, and CI integration.

Browse marketplace