Generating semantic adversarial examples via feature manipulation in latent space

Shuo Wang, Shangyu Chen, Tianle Chen, Surya Nepal, Carsten Rudolph, Marthie Grobler

Research output: Contribution to journalArticleResearchpeer-review

1 Citation (Scopus)

Abstract

The susceptibility of deep neural networks (DNNs) to adversarial intrusions, exemplified by adversarial examples, is well-documented. Conventional attacks implement unstructured, pixel-wise perturbations to mislead classifiers, which often results in a noticeable departure from natural samples and lacks human-perceptible interpretability. In this work, we present an adversarial attack strategy that implements fine-granularity, semantic-meaning-oriented structural perturbations. Our proposed methodology manipulates the semantic attributes of images through the use of disentangled latent codes. We engineer adversarial perturbations by manipulating either a single latent code or a combination thereof. To this end, we propose two unsupervised semantic manipulation strategies: one based on vector-disentangled representation and the other on feature map-disentangled representation, taking into consideration the complexity of the latent codes and the smoothness of the reconstructed images. Our empirical evaluations, conducted extensively on real-world image data, showcase the potency of our attacks, particularly against black-box classifiers. Furthermore, we establish the existence of a universal semantic adversarial example that is agnostic to specific images.

Original languageEnglish
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
Publication statusAccepted/In press - 10 Aug 2023

Keywords

  • Adversarial examples
  • Codes
  • Decoding
  • Face recognition
  • feature manipulation
  • Generative adversarial networks
  • Image reconstruction
  • latent representation
  • neural networks
  • Perturbation methods
  • Semantics
  • variational autoencoder (VAE)

Cite this