A student studies with a laptop in a modern library

AI TutorsStanford LawLegal EducationSofAIContracts

AI and Legal Education

AI Tutors Meet the Socratic Method

A more useful reading of the Stanford-led study on law professors preferring AI answers: not as a victory lap for machines, but as a design brief for better legal education.

SofAI Bar Tutor Editorial BoardJuly 1, 202613 min read18 questions

Thesis

The point is not that AI replaces law professors. The point is that expert standards can be made visible, repeatable, and available to every student while the human learner remains the senior partner.

Key takeaways

The Stanford-led paper studied judgment-rich tutoring, not simple right-or-wrong recall.

Sixteen contracts professors helped create and judge short-answer responses in a blinded comparison design.

The study reported that professors preferred LLM answers at roughly a three-to-one rate over peer answers in the studied setting.

The right lesson for SofAI is guardrailed amplification: source grounding, expert rubrics, harmfulness checks, student supervision, and deliberate practice.

The study is powerful, but limited: contracts, short answers, controlled prompts, and tutoring responses are not the same as legal advice or bar passage.

Professors

Contracts professors participated as instructors and judges.

Questions

Representative office-hours-style questions across recall, doctrine, hypotheticals, and policy.

Comparisons

2,918

Blinded forced-choice comparisons between anonymized human and AI responses.

Design lesson

Judgment

The study asks whether AI can align with expert professional standards, not merely produce facts.

The study

What the Stanford-led paper actually tested

Alejandro Salinas, Carly Frieders, Neel Guha, Sibo Ma, Julian Nyarko, and coauthors studied a hard problem in AI education: domains where there is no single answer key. Law is an ideal test because legal education often asks students to weigh ambiguity, argue both sides, and reach a defensible conclusion.

The study used contracts tutoring questions. Sixteen U.S. contracts professors from fourteen law schools authored representative questions, wrote answers, and then judged anonymized answer pairs. The paper reports 2,918 blinded comparisons and a strong expert preference for LLM responses in that setting.

The evaluation covered recall, doctrine, hypotheticals, and policy questions.
Professors judged which answer they would rather give to a student.
Judges could also flag pedagogically harmful answers.

Stanford Law School hosted PDF

The result

The headline is dramatic, but the useful lesson is deeper

The paper reports that LLMs were preferred far more often than peer instructor answers in the studied comparisons, with an average LLM win rate around 75 percent. It also reports lower harmfulness flags for LLM answers than for professor answers in aggregate.

That does not mean law professors are obsolete. It means that a well-designed AI tutor can sometimes express the shared professional standard with unusual consistency, clarity, and availability. The educational revolution is not replacing the professor. It is making expert feedback abundant enough that students can practice more often.

The VR School reading

The human student should not become passive. SofAI should act as junior associate, grader, and coach. The learner remains the senior partner who verifies sources, edits reasoning, and signs off.

Stanford Law School hosted PDF

Judgment-rich learning

Why legal education is different from answer-key tutoring

A math drill can often be checked against one correct answer. A legal hypothetical usually cannot. Two answers can reach different conclusions and both be strong if each states the right rule, uses the right facts, addresses counterarguments, and reasons honestly.

That is why the study matters for bar preparation. The California Bar Exam grading standard also values analysis, material facts, law application, and logical lawyer-like reasoning. The bar is already a judgment-rich assessment.

Recall is necessary but insufficient.
A rule without facts is a rule dump.
A conclusion without a reasoning bridge is a conclusory leap.
A tutor must teach the student how to decide, not just what to say.

Stanford Law School hosted PDF State Bar of California

Design brief

How SofAI should become the best legal tutor on the planet

The study gives The VR School of Law a design brief: build tutors that are evaluated against expert preferences, but also constrained by official sources, student agency, and transparent rubrics.

SofAI should ask for the student's draft before giving a model. It should flag missing issues, incomplete rules, weak fact use, and dangerous overconfidence. It should cite official sources when exam rules, grading, or legislation matter. It should force rewrites until the student can explain every move.

Use expert-style preference evaluation for open-ended answers.
Use official source rails for exam administration, legislation, and professional responsibility.
Use question banks so every update becomes retrievable and testable.
Use harmfulness checks so polished answers do not become misleading answers.
Use student-supervised rewriting so learning remains active.

Stanford Law School hosted PDF State Bar of California

Limits

What the study does not prove

The study does not prove that AI should give legal advice. It does not prove that an AI tutor can replace a full law school class, clinical supervision, professional responsibility training, or bar admission judgment. It studied short tutoring answers in contracts under a controlled evaluation design.

That limitation is not a weakness. It is the reason to build carefully. The correct product is not an answer machine. It is a supervised practice environment where source-grounded AI makes high-quality feedback more frequent, and human students learn to become better lawyers.

No attorney-client relationship.
No bar passage guarantee.
No uncited exam-rule claims.
No copying answers into graded work.
No treating elegance as a substitute for verified law.

Stanford Law School hosted PDF

Living question bank

Every update becomes something you can answer.

This bank is built for SofAI quizzes, spaced repetition, legislative tracking, and bar-prep check-ins. Each question is designed to become a short answer, essay paragraph, MBE explanation, or performance-test planning move.

Study design

What happened in the Stanford-led study? - 6 questions

Question 1

What kind of legal education problem did the paper study?

Judgment-rich tutoring, where quality depends on reasoning, ambiguity, and defensible conclusions rather than one answer key.

Question 2

What subject did the study use?

Contracts.

Question 3

How many professors participated?

Sixteen U.S. contracts professors.

Question 4

How many representative questions were curated?

Forty questions.

Question 5

How were answers evaluated?

Through blinded, forced-choice comparisons in which professors selected the answer they would prefer to give a student.

Question 6

What is the core evaluation insight?

Expert agreement can reveal whether an AI answer aligns with shared professional standards in a domain without a single ground truth.

Implications for law school

What should educators learn? - 6 questions

Question 1

Why is law an especially good test for AI tutoring?

Because legal education trains students to apply doctrine to new facts, weigh ambiguity, and argue defensible conclusions.

Question 2

What should a legal AI tutor never do?

It should never pretend to be a student's lawyer, fabricate sources, promise outcomes, or replace the student's own reasoning.

Question 3

What does the senior partner model mean?

AI can draft, critique, and suggest, but the student must supervise, verify, revise, and own the final judgment.

Question 4

How does this connect to California Bar essays?

Both require more than recall: they require fact selection, legal rule control, application, organization, and logical reasoning.

Question 5

What is the wrong conclusion from the study?

That law professors or human legal education are unnecessary.

Question 6

What is the right conclusion from the study?

AI can make expert-style feedback more abundant if the system is source-grounded, guardrailed, and practice-centered.

SofAI implementation

How should this become product? - 6 questions

Question 1

What should SofAI ask before producing a model answer?

It should ask for the student's issue list, outline, or draft so the student remains active.

Question 2

What should SofAI flag in a legal-writing draft?

Missed issues, incomplete rules, conclusory leaps, weak fact use, missing counterarguments, and unsupported conclusions.

Question 3

When should SofAI cite official sources?

Whenever it discusses exam administration, grading, subjects tested, legislation, professional responsibility, or other rules that may change.

Question 4

How should SofAI use the question bank?

It should quiz, diagnose misses, add spaced repetition, and turn each legislative or exam update into testable knowledge.

Question 5

What is a harmful answer in legal tutoring?

An answer that is confident but wrong, uncited on a changing rule, misleading about legal rights, or encourages the student to copy without understanding.

Question 6

What is the final student move after receiving AI feedback?

Rewrite the answer in the student's own reasoning, verify the governing source, and explain why each fact matters.

Sources

Citations and source rails

Current exam administration and law-change questions should always be confirmed with primary sources.

Stanford Law School hosted PDF

Law Professors Prefer AI Over Peer Answers

Salinas, Frieders, Guha, Ma, Nyarko, and coauthors. The paper reports a blinded evaluation of LLM and professor answers to contracts tutoring questions.

Accessed July 1, 2026

State Bar of California

California Bar Exam Grading

Official source for essay/PT grading standards, PT expectations, raw written score range, reread range, and passing scaled score.

Accessed July 1, 2026

Update log

How this article changes

July 1, 2026

Initial article published with Stanford-led study synthesis and SofAI implementation question bank.

Continue the law review

All articles

What the California General Bar Exam Tests Now, and What AB 2109 Could Change

A living guide to the current California General Bar Exam, the subjects tested, how graders read answers, and the question bank every serious candidate should use to track the proposed UBE transition.

IRAC in the Wild: How a CPRA Dispute Becomes a Masterclass in Legal Reasoning

A living dissection of a real memorandum — the California Public Records Act, professional-conduct rules, litigation privilege, qualified immunity, and the complete IRAC framework that holds them together — taught the way a law review article should be written.

What the Stanford-led paper actually tested

The evaluation covered recall, doctrine, hypotheticals, and policy questions.

Professors judged which answer they would rather give to a student.

Judges could also flag pedagogically harmful answers.

The headline is dramatic, but the useful lesson is deeper

The VR School reading

The human student should not become passive. SofAI should act as junior associate, grader, and coach. The learner remains the senior partner who verifies sources, edits reasoning, and signs off.

Why legal education is different from answer-key tutoring

Recall is necessary but insufficient.

A rule without facts is a rule dump.

A conclusion without a reasoning bridge is a conclusory leap.

A tutor must teach the student how to decide, not just what to say.

How SofAI should become the best legal tutor on the planet

The study gives The VR School of Law a design brief: build tutors that are evaluated against expert preferences, but also constrained by official sources, student agency, and transparent rubrics.

Use expert-style preference evaluation for open-ended answers.

Use official source rails for exam administration, legislation, and professional responsibility.

Use question banks so every update becomes retrievable and testable.

Use harmfulness checks so polished answers do not become misleading answers.

Use student-supervised rewriting so learning remains active.

What the study does not prove

No attorney-client relationship.

No bar passage guarantee.

No uncited exam-rule claims.

No copying answers into graded work.

No treating elegance as a substitute for verified law.