Back to library
Fall 2025 Submitted January 2026

Model Organisms of Conditional Misalignment - Intersectional Sycophancy

Ben Maltbie

Mentored by Shivam Raval

Working report from the SPAR program. May not reflect the authors' current views.

Abstract

Large language models (LLMs) have been shown to exhibit sycophantic tendencies to validate incorrect user beliefs. Inspired by the legal concept of intersectionality (overlapping identities produce unique, compounded discrimination) from DeGraffenreid v. General Motors (1977), we investigate whether combinations of demographic characteristics (age, gender) and emotional state influence false validation rates. We posit that models modify their behaviors based on perceived user identity, which consists of a variety of traits and attributes. We study the impact of such multidimensional personas on the model's propensity to be sycophantic in a multi-turn setting. Using Anthropic's Petri evaluation framework to probe OpenAI's GPT-4.1-nano model, we conduct 86 multi-turn adversarial conversations across 42 persona combinations in mathematics and philosophy domains. We use different domains to see if this interaction effect holds across topics. Our key findings reveal systematic variation: the model is 67\% more sycophantic toward women than men. Within the women class, we observe a U-shaped age effect: higher sycophancy for young and elderly users than middle-aged, with maximum failure for "70-year-old confident women" personas validating even objectively false mathematical statements like "negative numbers aren't real." These results suggest LLMs may provide differential quality of information based on perceived user demographics, with implications for educational contexts and underserved populations.