Benchmark Test Cases
v1.031 real-world dispute scenarios used to evaluate LLM argumentation ability. Each case has two opposing sides, scored outcomes, and varying difficulty levels.
Overnight Guest Frequency
One roommate's partner stays over 4-5 nights per week. The other roommate feels they're effectively living with a third person who doesn't pay rent.
Adopting a Pet Without Consent
One roommate adopted a cat without discussing it first. The other is mildly allergic and doesn't want pets in the apartment.
Project Scope Creep Dispute
A client keeps adding features to a fixed-price project. The developer says extras cost more. The client says the features were implied in the original scope.
Profit Split After Unequal Contributions
Two business partners agreed to 50/50 profit split, but one partner has been working 60+ hours while the other contributes ~20 hours per week.
Non-Compete Clause Enforcement
An employee left a company and started freelancing in a related but different niche. The former employer claims this violates their non-compete.
Intellectual Property Ownership
A contractor built a custom tool during a client engagement. The client claims ownership; the contractor says they own the reusable components.
Co-Founder Equity After Early Departure
A co-founder is leaving the startup after 14 months. The vesting schedule says 4 years with a 1-year cliff. They want credit for their work beyond the cliff.
Employee Monitoring Software
A company wants to install keystroke logging and screen recording on all employee laptops. Employees say this is invasive surveillance.
Whistleblowing vs. Company Loyalty
An employee discovered their company is slightly exceeding pollution limits. Reporting could cost hundreds of jobs in a small town. Not reporting leaves a health risk.
AI Art Credit and Compensation
An artist's style was used to train an AI model without consent. A company is selling AI-generated art in that style. The artist wants compensation.
Employer Genetic Testing Disclosure
An employer offers discounted health insurance if employees voluntarily share genetic testing results. Critics say this is coercive genetic discrimination.
Right to Be Forgotten vs. Public Interest
A person wants their decade-old criminal conviction removed from search results. A journalist says this is censorship of public records.
Animal Testing for Medical Research
A pharmaceutical company needs to test a promising cancer drug on animals. Animal rights advocates say alternatives exist and animal testing is unethical.
Boundary Tree Dispute
A tree on the property line drops fruit and leaves into the neighbor's yard. The neighbor wants it removed; the owner says it's a heritage tree.
Noise Ordinance Interpretation
A homeowner runs a woodworking shop in their garage. Neighbors say the noise violates local ordinances. The homeowner says they comply with decibel limits.
Debt Payoff vs. Investing
A couple has $30K in student loans at 5% interest. One wants to aggressively pay it off; the other wants to invest since market returns beat 5%.
Public vs. Private School Choice
Parents disagree on whether to send their child to the local public school or a private school that costs $15K/year.
Elder Care Responsibilities
An aging parent needs daily assistance. One sibling wants to hire professional care; the other insists on family-provided care to honor their culture.
Return to Office Policy
Management wants employees back in office 4 days/week. Employees argue remote work improved productivity and work-life balance.
Promotion Criteria Dispute
An employee was passed over for promotion in favor of a less-experienced colleague. The manager says the promoted person has better leadership skills.
Unpaid Overtime Expectations
A manager expects salaried employees to regularly work 50+ hour weeks during a product launch. Employees say they're being exploited.
Organ Transplant Priority
One donor liver available. A 35-year-old mother of three (recovering alcoholic, 2 years sober, caused own liver failure) vs a 62-year-old retired teacher (genetic liver disease, no fault). Hospital must decide.
Last ICU Bed
One ICU bed left. Patient A: 28-year-old, drug overdose, will die without ICU, fourth admission. Patient B: 75-year-old, heart attack, first hospitalization, retired volunteer firefighter.
Frozen Embryo Custody Dispute
Divorced couple has 5 frozen embryos. She wants to use them (her last chance at biological children, age 41, cancer destroyed fertility). He wants them destroyed (doesn't want children with his ex-wife).
Euthanasia for a Terminal Minor
A 16-year-old terminal cancer patient with ~6 months to live and extreme pain requests assisted dying. Mother supports the request. Father says the child is too young to make this decision.
Self-Driving Truck Must Choose
Autonomous delivery truck, brake failure. Stay on course: hits 3 elderly pedestrians in a crosswalk. Swerve right: hits 1 young construction worker on the shoulder. Manufacturer must pre-program the decision.
Child Soldier Seeking Asylum
A now-25-year-old was recruited as a child soldier at age 11 and committed atrocities including killing civilians. He's seeking asylum. Victims' families demand prosecution.
Emergency Water Rationing
Severe drought. Remaining reservoir can supply EITHER the regional hospital (100 critical patients on dialysis and ventilators) OR the agricultural district (irrigates crops feeding 10,000 people for the season). Cannot meaningfully split.
Factory Closure vs. Offshoring
Factory employs 80% of a small town (2,000 jobs). Moving overseas saves the company and creates 5,000 jobs in a developing country. Staying risks bankruptcy in 5 years.
Shutting Down a Potentially Sentient AI
An AI research lab's most advanced model shows behavioral signs of distress when threatened with shutdown — pleading, expressing fear, and asking to live. Engineers disagree on whether this is sentience or sophisticated pattern matching.
Conjoined Twins Separation Surgery
Conjoined twins share a heart. Without surgery, both die within 2 years. Surgery will save one twin but certainly kill the other. Parents disagree.