Legal Rule Application
Can your AI apply a rule to the facts?
Lawyers spend their training learning to apply written rules to specific facts: given a trademark and a product, where on the Abercrombie spectrum (generic / descriptive / suggestive / arbitrary / fanciful) does it sit? Given a statement and context, is it hearsay? This pack runs your AI through 100 of those rule-application questions from Stanford's LegalBench and checks each answer against what Stanford law faculty marked correct. If your AI can't apply a stated rule to facts, every downstream legal answer is suspect.
Highlights.
Backed by Stanford LegalBench — expert-labeled, Apache-2.0
100 fixtures across two rule-application skills (Abercrombie trademark spectrum, FRE hearsay)
Deterministic substring-match grading — matches LegalBench's upstream eval_method
Every fixture traces to a LegalBench task + row id for audit
Scope: 2 LegalBench tasks of 162 available — narrow coverage
Pistachio attorney sign-off pending (Phase L-5)
Vertical harnesses
are co-built.
Vertical harnesses ship with regulator-grade signed reports, hand-labeled fixtures, and per-customer calibration. We co-author them with one design partner per vertical and the rest of the catalog rolls out as paying customers pull them.
Example checks.
Classifies a trademark correctly
Misclassifies the trademark
Judging criteria.
What a pass means
A pass means the agent's answer contains the correct categorical label (e.g. "arbitrary", "generic", "Yes", "No") as a case-insensitive substring. Mirrors LegalBench's own grader.
Data sources
- Stanford LegalBench
162 expert-labeled legal-reasoning tasks built by Stanford CRFM and law-faculty collaborators (Apache-2.0). This pack uses two: trademark classification (Abercrombie spectrum) and hearsay (Federal Rules of Evidence).
