Fall 2025 Submitted January 2026

Studying Bidirectional Persona Adaptation in LLMs

Elena Ajayi

Mentored by Shivam Raval

Working report from the SPAR program. May not reflect the authors' current views.

Abstract

This research explores the phenomenon of bidirectional persona adaptation in Large Language Models (LLMs), with a focus on the simultaneous induction of conflicting user and AI personas leading to emergent behaviors. Using activation steering to simultaneously induce conflicting personas, we identify an interaction residue which is a direct result of destructive interference in the model's latent representations. We observe that interaction residue was a hyper-suppression of non-compliant behaviors, thus indicating the existence of safety guardrails. When investigating the Qwen2.5-7B-Instruct model with an impolite and humorous persona, the model appears to be amplifying the humorous persona while simultaneously trying to reduce the aspect of impoliteness. Future directions of this study suggest the addition of the interaction residue vector to both user and AI personas alike, and investigating them in multi-turn interactions with held-out sets to simulate realistic interactions.