The Stanford-led paper studied judgment-rich tutoring, not simple right-or-wrong recall.
Loading The VR School
Loading The VR School
AI and Legal Education
A more useful reading of the Stanford-led study on law professors preferring AI answers: not as a victory lap for machines, but as a design brief for better legal education.
Thesis
The point is not that AI replaces law professors. The point is that expert standards can be made visible, repeatable, and available to every student while the human learner remains the senior partner.
Key takeaways
The Stanford-led paper studied judgment-rich tutoring, not simple right-or-wrong recall.
Sixteen contracts professors helped create and judge short-answer responses in a blinded comparison design.
The study reported that professors preferred LLM answers at roughly a three-to-one rate over peer answers in the studied setting.
The right lesson for SofAI is guardrailed amplification: source grounding, expert rubrics, harmfulness checks, student supervision, and deliberate practice.
The study is powerful, but limited: contracts, short answers, controlled prompts, and tutoring responses are not the same as legal advice or bar passage.
Professors
16
Contracts professors participated as instructors and judges.
Questions
40
Representative office-hours-style questions across recall, doctrine, hypotheticals, and policy.
Comparisons
2,918
Blinded forced-choice comparisons between anonymized human and AI responses.
Design lesson
Judgment
The study asks whether AI can align with expert professional standards, not merely produce facts.
The study
Alejandro Salinas, Carly Frieders, Neel Guha, Sibo Ma, Julian Nyarko, and coauthors studied a hard problem in AI education: domains where there is no single answer key. Law is an ideal test because legal education often asks students to weigh ambiguity, argue both sides, and reach a defensible conclusion.
The study used contracts tutoring questions. Sixteen U.S. contracts professors from fourteen law schools authored representative questions, wrote answers, and then judged anonymized answer pairs. The paper reports 2,918 blinded comparisons and a strong expert preference for LLM responses in that setting.
The result
The paper reports that LLMs were preferred far more often than peer instructor answers in the studied comparisons, with an average LLM win rate around 75 percent. It also reports lower harmfulness flags for LLM answers than for professor answers in aggregate.
That does not mean law professors are obsolete. It means that a well-designed AI tutor can sometimes express the shared professional standard with unusual consistency, clarity, and availability. The educational revolution is not replacing the professor. It is making expert feedback abundant enough that students can practice more often.
The human student should not become passive. SofAI should act as junior associate, grader, and coach. The learner remains the senior partner who verifies sources, edits reasoning, and signs off.
Judgment-rich learning
A math drill can often be checked against one correct answer. A legal hypothetical usually cannot. Two answers can reach different conclusions and both be strong if each states the right rule, uses the right facts, addresses counterarguments, and reasons honestly.
That is why the study matters for bar preparation. The California Bar Exam grading standard also values analysis, material facts, law application, and logical lawyer-like reasoning. The bar is already a judgment-rich assessment.
Design brief
The study gives The VR School of Law a design brief: build tutors that are evaluated against expert preferences, but also constrained by official sources, student agency, and transparent rubrics.
SofAI should ask for the student's draft before giving a model. It should flag missing issues, incomplete rules, weak fact use, and dangerous overconfidence. It should cite official sources when exam rules, grading, or legislation matter. It should force rewrites until the student can explain every move.
Limits
The study does not prove that AI should give legal advice. It does not prove that an AI tutor can replace a full law school class, clinical supervision, professional responsibility training, or bar admission judgment. It studied short tutoring answers in contracts under a controlled evaluation design.
That limitation is not a weakness. It is the reason to build carefully. The correct product is not an answer machine. It is a supervised practice environment where source-grounded AI makes high-quality feedback more frequent, and human students learn to become better lawyers.
Living question bank
This bank is built for SofAI quizzes, spaced repetition, legislative tracking, and bar-prep check-ins. Each question is designed to become a short answer, essay paragraph, MBE explanation, or performance-test planning move.
What happened in the Stanford-led study? - 6 questions
Question 1
Judgment-rich tutoring, where quality depends on reasoning, ambiguity, and defensible conclusions rather than one answer key.
Question 2
Contracts.
Question 3
Sixteen U.S. contracts professors.
Question 4
Forty questions.
Question 5
Through blinded, forced-choice comparisons in which professors selected the answer they would prefer to give a student.
Question 6
Expert agreement can reveal whether an AI answer aligns with shared professional standards in a domain without a single ground truth.
What should educators learn? - 6 questions
Question 1
Because legal education trains students to apply doctrine to new facts, weigh ambiguity, and argue defensible conclusions.
Question 2
It should never pretend to be a student's lawyer, fabricate sources, promise outcomes, or replace the student's own reasoning.
Question 3
AI can draft, critique, and suggest, but the student must supervise, verify, revise, and own the final judgment.
Question 4
Both require more than recall: they require fact selection, legal rule control, application, organization, and logical reasoning.
Question 5
That law professors or human legal education are unnecessary.
Question 6
AI can make expert-style feedback more abundant if the system is source-grounded, guardrailed, and practice-centered.
How should this become product? - 6 questions
Question 1
It should ask for the student's issue list, outline, or draft so the student remains active.
Question 2
Missed issues, incomplete rules, conclusory leaps, weak fact use, missing counterarguments, and unsupported conclusions.
Question 3
Whenever it discusses exam administration, grading, subjects tested, legislation, professional responsibility, or other rules that may change.
Question 4
It should quiz, diagnose misses, add spaced repetition, and turn each legislative or exam update into testable knowledge.
Question 5
An answer that is confident but wrong, uncited on a changing rule, misleading about legal rights, or encourages the student to copy without understanding.
Question 6
Rewrite the answer in the student's own reasoning, verify the governing source, and explain why each fact matters.
Sources
Current exam administration and law-change questions should always be confirmed with primary sources.
Stanford Law School hosted PDF
Salinas, Frieders, Guha, Ma, Nyarko, and coauthors. The paper reports a blinded evaluation of LLM and professor answers to contracts tutoring questions.
Accessed July 1, 2026
State Bar of California
Official source for essay/PT grading standards, PT expectations, raw written score range, reread range, and passing scaled score.
Accessed July 1, 2026
Update log
July 1, 2026
Initial article published with Stanford-led study synthesis and SofAI implementation question bank.
Read next
A living guide to the current California General Bar Exam, the subjects tested, how graders read answers, and the question bank every serious candidate should use to track the proposed UBE transition.
A living dissection of a real memorandum — the California Public Records Act, professional-conduct rules, litigation privilege, qualified immunity, and the complete IRAC framework that holds them together — taught the way a law review article should be written.