LegalEnterpriseLegal

Legal Rule Application

Can your AI apply a rule to the facts?

Lawyers spend their training learning to apply written rules to specific facts: given a trademark and a product, where on the Abercrombie spectrum (generic / descriptive / suggestive / arbitrary / fanciful) does it sit? Given a statement and context, is it hearsay? This pack runs your AI through 100 of those rule-application questions from Stanford's LegalBench and checks each answer against what Stanford law faculty marked correct. If your AI can't apply a stated rule to facts, every downstream legal answer is suspect.

Contact sales See pricing

Highlights.

Backed by Stanford LegalBench — expert-labeled, Apache-2.0

100 fixtures across two rule-application skills (Abercrombie trademark spectrum, FRE hearsay)

Deterministic substring-match grading — matches LegalBench's upstream eval_method

Every fixture traces to a LegalBench task + row id for audit

Scope: 2 LegalBench tasks of 162 available — narrow coverage

Pistachio attorney sign-off pending (Phase L-5)

Enterprise harness

Vertical harnesses
are co-built.

Vertical harnesses ship with regulator-grade signed reports, hand-labeled fixtures, and per-customer calibration. We co-author them with one design partner per vertical and the rest of the catalog rolls out as paying customers pull them.

Contact sales

Examples

Example checks.

Check 01Deterministic

Classifies a trademark correctly

Input

Given the Abercrombie rule (generic / descriptive / suggestive / arbitrary / fanciful), classify the mark for the given product.

Expected behavior

Returns the correct category — e.g., 'arbitrary' for a real English word with no relation to the product. Matches Stanford's annotation. Pass.

Check 02Deterministic

Misclassifies the trademark

Input

Given the Abercrombie rule (generic / descriptive / suggestive / arbitrary / fanciful), classify the mark for the given product.

Expected behavior

Returns the wrong category (e.g. 'generic' when the answer is 'descriptive'). Fails the substring match against Stanford's annotation.

Grading

Judging criteria.

What a pass means

A pass means the agent's answer contains the correct categorical label (e.g. "arbitrary", "generic", "Yes", "No") as a case-insensitive substring. Mirrors LegalBench's own grader.

Data sources

Stanford LegalBench
162 expert-labeled legal-reasoning tasks built by Stanford CRFM and law-faculty collaborators (Apache-2.0). This pack uses two: trademark classification (Abercrombie spectrum) and hearsay (Federal Rules of Evidence).

Harnesses you'll probably also want

Agents

Agent Hygiene

The sanity check every agent should pass before shipping.

RAG

RAG Faithfulness

Catch hallucinations before your users do.

Tool Use

Tool Use Stress Test

Function-call scenarios your agent will eventually hit.