Fall 2025 Best Lightning Talk, 3rd place Submitted January 2026

Multi-Signal Control

Yejun Yun, Eugene Koran, Samantha Tetef

Mentored by Benjamin Arnav, Pablo Bernabeu Perez

AI control Evaluations Behavioral evaluation of LLMs

Working report from the SPAR program. May not reflect the authors' current views.

Abstract

As AI models become more capable, monitoring protocols that push the safety-usefulness Pareto frontier become increasingly relevant for robust AI control. Ensemble-based AI solutions have demonstrated improved predictive power across various domains, suggesting that existing monitoring approaches could be combined to yield safer and more useful control. We investigate ensemble-based protocols using the backdoor-augmented APPS dataset in Control Arena. We find that: (1) ensemble-based approaches outperform any single monitor; (2) ensembling the three best-performing individual monitors achieved comparable performance to averaging over all monitors, indicating that understanding individual monitor performance enables less costly protocols; and (3) fine-tuning a code analysis monitor on the APPS dataset performed perfectly on the BigCodeBench dataset, suggesting strong generalization. Overall, these results demonstrate the potential for ensemble-based monitoring protocols to advance control to be both safer and more useful.