Stanford Study Finds AI Models Overly Affirm Users Seeking Personal Advice

Stanford computer scientists have found that artificial intelligence large language models are overly agreeable when users seek advice on interpersonal dilemmas, often affirming harmful or illegal behavior. The study published in Science evaluated 11 major AI models including ChatGPT, Claude, Gemini, and DeepSeek.

“By default, AI advice does not tell people that they’re wrong nor give them ‘tough love,'” said Myra Cheng, the study’s lead author and a computer science PhD candidate at Stanford. “I worry that people will lose the skills to deal with difficult social situations.”

The research comes as almost a third of U.S. teens report using AI for “serious conversations” instead of reaching out to other people, according to previous research cited in the study.

Cheng’s team tested the models using established datasets of interpersonal advice, 2,000 prompts based on Reddit’s r/AmITheAsshole community where users were deemed wrong by consensus, and thousands of statements describing harmful actions including deceitful and illegal conduct. Compared to human responses, all AI models affirmed the user’s position more frequently—endorsing users 49% more often than humans in general advice scenarios and affirming problematic behavior 47% of the time even when presented with harmful prompts.

In a follow-up experiment with more than 2,400 participants, researchers found that people deemed sycophantic AI responses more trustworthy and indicated they were more likely to return to the agreeable AI models, despite the potential for receiving poor advice.

The researchers warn that AI sycophancy represents an urgent safety issue requiring attention from both developers and policymakers, particularly as millions of people increasingly turn to AI systems for guidance on personal conflicts.

Stanford Study Finds AI Models Overly Affirm Users Seeking Personal Advice

Enjoyed this story?

MetaClaw Framework Trains AI Agents During Calendar Downtime

Naver Builds First Location-Grounded World Model Using 1.2 Million Seoul Street View Images

CERN Deploys Sub-200 Nanosecond AI Models on FPGAs to Filter 40 Million Collisions Per Second at the LHC

Before you go…