Adversarially Robust Updates using (Perfect) Privacy
Daniel Hustert
Mentored by James Faville
Working report from the SPAR program. May not reflect the authors' current views.
Abstract
We study how agents can extract utility from observed data while avoiding exploitation by adversaries using some toy examples. In many strategic or multi-agent settings, conditioning on adversary-controllable information can be detrimental to performance or safety. We discuss how agents can perform partial updates using techniques discussed in the privacy-utility tradeoff literature to retain utility-relevant information while discarding information that renders them exploitable. In an example problem, we develop an adversarial encoder-decoder-discriminator architecture that balances the privacy-utility trade-off by minimizing mutual information with sensitive features while preserving useful signal content. While the example needs further development, we show that an agent using these learned representations performs competitively while being more resistant to exploitation.