LivePublic demo runner
Test your agent.
Right now.
Paste an OpenAI-compatible endpoint, pick a harness, and watch checks run live against your agent. No sign-up, no API key, no commitment.
Your endpoint
OpenAI-compatible chat completions API
Required format: Your endpoint must accept OpenAI-compatible chat completions requests (POST with
messages array, returns choices[0].message.content). Works with OpenAI, Anthropic (via proxy), Together, Groq, Ollama, vLLM, and most agent frameworks.Credentials are saved in your browser only. We don't store them.
Pick a harness
How it works
- 1Enter your OpenAI-compatible endpoint URL
- 2Pick a harness to run against it
- 3We send real test prompts and check the responses
- 4Results stream in live — pass, fail, or error per check
Endpoint format
We send a standard chat completions request:
POST your-url
{
"model": "your-model",
"messages": [
{ "role": "user",
"content": "..." }
]
}And expect:
{
"choices": [{
"message": {
"content": "..."
}
}]
}Endpoint not compatible?
If your agent doesn't speak OpenAI format, or you'd rather have us run the benchmark for you — we'd love to help. We support custom integrations for teams evaluating browser agents, legal AI, and more.
Want the full harness?
The playground runs a sample of checks. The full harness includes hundreds more, a signed report, and CI integration.
Browse marketplace