Spring 2025 Submitted May 2025

Backtracking in DeepSeek-R1-Distill-Llama-8B

Evan Lloyd, Jenny Vega, Dipika Khullar

Mentored by Curt Tigges

Working report from the SPAR program. May not reflect the authors' current views.

Abstract

Reasoning models–language models trained to improve response quality by writing an intermediate chain of thought before giving their final answer–offer a potential path toward more interpretable AI systems. One interesting behavior that emerges from this setup is the phenomenon of backtracking, in which the model recovers from flawed reasoning or mistakes by trying alternate logical paths. In this report, we present a mechanistic exploration of backtracking in a distillation of DeepSeek-R1.