LLMs Use World Model-like Representations
Connor Buchheit, Ankush Checkervarty, Sergio Hernandez Cuenca, Xiaoxuan Lei, Carlos Hernandez, Royden Wagner, Pravish Sainath, Antoine Bossan, Guy Yariv
Mentored by Matthieu Tehenan
Working report from the SPAR program. May not reflect the authors' current views.
Abstract
We investigate whether large language models (LLMs) employ world model-like representations during inference. Specifically, we evaluate the ability of recent LLMs to navigate grid environments and analyze their activations during action generation. We train linear, MLP, and transformer-based probes to decode environment states, goals, and progress from activations. More complex probes achieve higher accuracy, which indicates a nonlinear structure. Furthermore, we reveal a difference in representational focus across models. For Qwen3, our probes achieve higher accuracy for goals, but lower accuracy for state prediction, suggesting a goal-oriented focus. In contrast, we achieve higher state prediction accuracy for Phi-4 models, suggesting a more world model-like encoding of state-action pairs.Notably, the world model-like representations of Phi-4 correlate with higher success rates in our task. Overall, our results suggest that LLMs incorporate approximate world models that enhance performance in navigation tasks.