Abstract
The importance of explanations (XP's) of machine learning (ML) model predictions and of adversarial examples (AE's) cannot be overstated, with both arguably being essential for the practical success of ML in different settings. There has been recent work on understanding and assessing the relationship between XP's and AE's. However, such work has been mostly experimental and a sound theoretical relationship has been elusive. This paper demonstrates that explanations and adversarial examples are related by a generalized form of hitting set duality, which extends earlier work on hitting set duality observed in model-based diagnosis and knowledge compilation. Furthermore, the paper proposes algorithms, which enable computing adversarial examples from explanations and vice-versa.
Original language | English |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 32 (NIPS 2019) |
Editors | H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, R. Garnett |
Place of Publication | San Diego CA USA |
Publisher | Neural Information Processing Systems (NIPS) |
Pages | 15857-15867 |
Number of pages | 11 |
Volume | 32 |
Publication status | Published - 2019 |
Event | Advances in Neural Information Processing Systems 2019 - Vancouver, Canada Duration: 8 Dec 2019 → 14 Dec 2019 Conference number: 32nd https://nips.cc/Conferences/2019 (Proceedings) https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019 (Proceedings) |
Publication series
Name | Advances in Neural Information Processing Systems |
---|---|
Publisher | Morgan Kaufmann Publishers |
ISSN (Print) | 1049-5258 |
Conference
Conference | Advances in Neural Information Processing Systems 2019 |
---|---|
Abbreviated title | NIPS 2019 |
Country/Territory | Canada |
City | Vancouver |
Period | 8/12/19 → 14/12/19 |
Internet address |