TY - JOUR
T1 - Generating semantic adversarial examples via feature manipulation in latent space
AU - Wang, Shuo
AU - Chen, Shangyu
AU - Chen, Tianle
AU - Nepal, Surya
AU - Rudolph, Carsten
AU - Grobler, Marthie
N1 - Publisher Copyright:
IEEE
PY - 2023/8/10
Y1 - 2023/8/10
N2 - The susceptibility of deep neural networks (DNNs) to adversarial intrusions, exemplified by adversarial examples, is well-documented. Conventional attacks implement unstructured, pixel-wise perturbations to mislead classifiers, which often results in a noticeable departure from natural samples and lacks human-perceptible interpretability. In this work, we present an adversarial attack strategy that implements fine-granularity, semantic-meaning-oriented structural perturbations. Our proposed methodology manipulates the semantic attributes of images through the use of disentangled latent codes. We engineer adversarial perturbations by manipulating either a single latent code or a combination thereof. To this end, we propose two unsupervised semantic manipulation strategies: one based on vector-disentangled representation and the other on feature map-disentangled representation, taking into consideration the complexity of the latent codes and the smoothness of the reconstructed images. Our empirical evaluations, conducted extensively on real-world image data, showcase the potency of our attacks, particularly against black-box classifiers. Furthermore, we establish the existence of a universal semantic adversarial example that is agnostic to specific images.
AB - The susceptibility of deep neural networks (DNNs) to adversarial intrusions, exemplified by adversarial examples, is well-documented. Conventional attacks implement unstructured, pixel-wise perturbations to mislead classifiers, which often results in a noticeable departure from natural samples and lacks human-perceptible interpretability. In this work, we present an adversarial attack strategy that implements fine-granularity, semantic-meaning-oriented structural perturbations. Our proposed methodology manipulates the semantic attributes of images through the use of disentangled latent codes. We engineer adversarial perturbations by manipulating either a single latent code or a combination thereof. To this end, we propose two unsupervised semantic manipulation strategies: one based on vector-disentangled representation and the other on feature map-disentangled representation, taking into consideration the complexity of the latent codes and the smoothness of the reconstructed images. Our empirical evaluations, conducted extensively on real-world image data, showcase the potency of our attacks, particularly against black-box classifiers. Furthermore, we establish the existence of a universal semantic adversarial example that is agnostic to specific images.
KW - Adversarial examples
KW - Codes
KW - Decoding
KW - Face recognition
KW - feature manipulation
KW - Generative adversarial networks
KW - Image reconstruction
KW - latent representation
KW - neural networks
KW - Perturbation methods
KW - Semantics
KW - variational autoencoder (VAE)
UR - http://www.scopus.com/inward/record.url?scp=85167833770&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2023.3299408
DO - 10.1109/TNNLS.2023.3299408
M3 - Article
C2 - 37561624
AN - SCOPUS:85167833770
SN - 2162-237X
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -