LivePublic demo runner

Test your agent.
Right now.

Paste an OpenAI-compatible endpoint, pick a harness, and watch checks run live against your agent. No sign-up, no API key, no commitment.

Your endpoint

OpenAI-compatible chat completions API

Required format: Your endpoint must accept OpenAI-compatible chat completions requests (POST with messages array, returns choices[0].message.content). Works with OpenAI, Anthropic (via proxy), Together, Groq, Ollama, vLLM, and most agent frameworks.

Endpoint URL

Authorization header(optional)

Model (optional)

Credentials are saved in your browser only. We don't store them.

Pick a harness

How it works

1Enter your OpenAI-compatible endpoint URL
2Pick a harness to run against it
3We send real test prompts and check the responses
4Results stream in live — pass, fail, or error per check

Endpoint format

We send a standard chat completions request:

POST your-url
{
  "model": "your-model",
  "messages": [
    { "role": "user",
      "content": "..." }
  ]
}

And expect:

{
  "choices": [{
    "message": {
      "content": "..."
    }
  }]
}

Endpoint not compatible?

If your agent doesn't speak OpenAI format, or you'd rather have us run the benchmark for you — we'd love to help. We support custom integrations for teams evaluating browser agents, legal AI, and more.

Email us Book a call

Want the full harness?

The playground runs a sample of checks. The full harness includes hundreds more, a signed report, and CI integration.