Fall 2025 Submitted January 2026

LLMs Use World Model-like Representations

Connor Buchheit, Ankush Checkervarty, Sergio Hernandez Cuenca, Xiaoxuan Lei, Carlos Hernandez, Royden Wagner, Pravish Sainath, Antoine Bossan, Guy Yariv

Mentored by Matthieu Tehenan

Chain of thought Mechanistic interpretability

Working report from the SPAR program. May not reflect the authors' current views.

Download PDF

Abstract

We investigate whether large language models (LLMs) employ world model-like representations during inference. Specifically, we evaluate the ability of recent LLMs to navigate grid environments and analyze their activations during action generation. We train linear, MLP, and transformer-based probes to decode environment states, goals, and progress from activations. More complex probes achieve higher accuracy, which indicates a nonlinear structure. Furthermore, we reveal a difference in representational focus across models. For Qwen3, our probes achieve higher accuracy for goals, but lower accuracy for state prediction, suggesting a goal-oriented focus. In contrast, we achieve higher state prediction accuracy for Phi-4 models, suggesting a more world model-like encoding of state-action pairs.Notably, the world model-like representations of Phi-4 correlate with higher success rates in our task. Overall, our results suggest that LLMs incorporate approximate world models that enhance performance in navigation tasks.