ResearchComing soon

Mind2Web Browser Agent Benchmark

Cross-website generalization: can agents handle sites they've never seen?

What this tests

Mind2Web tests browser agents on tasks across 137 websites spanning 31 domains. Unlike WebVoyager (which uses a fixed set of popular sites), Mind2Web specifically measures generalization — can an agent complete tasks on websites it has never encountered before? Tasks are derived from real user interactions and test both navigation and complex form completion.

Results coming soon

Full methodology and results coming soon. We will use the same controlled conditions as our WebVoyager benchmark — default configurations, identical evaluation criteria, and the same judge model.

Mind2Web results are coming soon. This benchmark will complement WebVoyager by testing generalization rather than performance on known sites.

Back to research