Spring 2025 Submitted May 2025
Backtracking in DeepSeek-R1-Distill-Llama-8B
Evan Lloyd, Jenny Vega, Dipika Khullar
Mentored by Curt Tigges
Working report from the SPAR program. May not reflect the authors' current views.
Abstract
Reasoning models–language models trained to improve response quality by writing an intermediate chain of thought before giving their final answer–offer a potential path toward more interpretable AI systems. One interesting behavior that emerges from this setup is the phenomenon of backtracking, in which the model recovers from flawed reasoning or mistakes by trying alternate logical paths. In this report, we present a mechanistic exploration of backtracking in a distillation of DeepSeek-R1.