Spring 2025 Submitted May 2025

Adversarially Robust Updates using (Perfect) Privacy

Daniel Hustert

Mentored by James Faville

Working report from the SPAR program. May not reflect the authors' current views.

Abstract

We study how agents can extract utility from observed data while avoiding exploitation by adversaries using some toy examples. In many strategic or multi-agent settings, conditioning on adversary-controllable information can be detrimental to performance or safety. We discuss how agents can perform partial updates using techniques discussed in the privacy-utility tradeoff literature to retain utility-relevant information while discarding information that renders them exploitable. In an example problem, we develop an adversarial encoder-decoder-discriminator architecture that balances the privacy-utility trade-off by minimizing mutual information with sensitive features while preserving useful signal content. While the example needs further development, we show that an agent using these learned representations performs competitively while being more resistant to exploitation.