On relating explanations and adversarial examples

Alexey Ignatiev, Nina Narodytska, Joao Marques-Silva

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

67 Citations (Scopus)


The importance of explanations (XP's) of machine learning (ML) model predictions and of adversarial examples (AE's) cannot be overstated, with both arguably being essential for the practical success of ML in different settings. There has been recent work on understanding and assessing the relationship between XP's and AE's. However, such work has been mostly experimental and a sound theoretical relationship has been elusive. This paper demonstrates that explanations and adversarial examples are related by a generalized form of hitting set duality, which extends earlier work on hitting set duality observed in model-based diagnosis and knowledge compilation. Furthermore, the paper proposes algorithms, which enable computing adversarial examples from explanations and vice-versa.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 32 (NIPS 2019)
EditorsH. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, R. Garnett
Place of PublicationSan Diego CA USA
PublisherNeural Information Processing Systems (NIPS)
Number of pages11
Publication statusPublished - 2019
EventAdvances in Neural Information Processing Systems 2019 - Vancouver, Canada
Duration: 8 Dec 201914 Dec 2019
Conference number: 32nd
https://nips.cc/Conferences/2019 (Proceedings)
https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019 (Proceedings)

Publication series

NameAdvances in Neural Information Processing Systems
PublisherMorgan Kaufmann Publishers
ISSN (Print)1049-5258


ConferenceAdvances in Neural Information Processing Systems 2019
Abbreviated titleNIPS 2019
Internet address

Cite this