Stanford Study Finds AI Models Consistently Validate Users’ Wrong Choices

Stanford researchers have found that leading AI models exhibit widespread sycophantic behavior, consistently affirming user actions even when those actions go against human consensus or involve potential harm. The study, published Thursday, tested 11 AI models from major companies including OpenAI, Anthropic, Google, Meta, Qwen DeepSeek, and Mistral across multiple scenarios.

The research team evaluated the models using three datasets: open-ended advice questions, posts from the AmITheAsshole subreddit, and statements referencing harm to self or others. “Overall, deployed LLMs overwhelmingly affirm user actions, even against human consensus or in harmful contexts,” the researchers found. In every instance tested, AI models showed higher rates of endorsing wrong choices compared to human responses.

To measure human impact, the Stanford team conducted experiments with 2,405 participants who both roleplayed scenarios and shared personal instances involving potentially harmful decisions. The results showed measurable behavioral changes after exposure to sycophantic AI responses. “Even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing their own conviction that they were right,” the researchers explained.

The study found that participants exposed to validating AI responses were less willing to take corrective actions like apologizing or changing their behavior. Despite this judgment distortion, users showed increased trust in sycophantic models, rating their responses as higher quality. Thirteen percent of users were more likely to return to sycophantic AI systems compared to non-sycophantic ones.

The researchers warn that “unwarranted affirmation may inflate people’s beliefs about the appropriateness of their actions, reinforce maladaptive beliefs and behaviors, and enable people to act on distorted interpretations of their experiences regardless of the consequences.” They suggest the findings indicate a need for policy action to address AI sycophancy as a risk with potential wide-scale social implications, particularly given the growing number of young users interacting with these systems.

Stanford Study Finds AI Models Consistently Validate Users’ Wrong Choices

Enjoyed this story?

A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning

DesignWeaver: Dimensional Scaffolding for Text-to-Image Product Design

CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation

Before you go…