Explainable AI for Android malware detection: towards understanding why the models perform so well?

Yue Liu, Chakkrit Tantithamthavorn, Li Li, Yepang Liu

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

31 Citations (Scopus)

Abstract

Machine learning (ML)-based Android malware detection has been one of the most popular research topics in the mobile security community. An increasing number of research studies have demonstrated that machine learning is an effective and promising approach for malware detection, and some works have even claimed that their proposed models could achieve 99% detection accuracy, leaving little room for further improvement. However, numerous prior studies have suggested that unrealistic experimental designs bring substantial biases, resulting in over-optimistic performance in malware detection. Unlike previous research that examined the detection performance of ML classifiers to locate the causes, this study employs Explainable AI (XAI) approaches to explore what ML-based models learned during the training process, inspecting and interpreting why ML-based malware classifiers perform so well under unrealistic experimental settings. We discover that temporal sample inconsistency in the training dataset brings over-optimistic classification performance (up to 99%F1 score and accuracy). Importantly, our results indicate that ML models classify malware based on temporal differences between malware and benign, rather than the actual malicious behaviors. Our evaluation also confirms the fact that unrealistic experimental designs lead to not only unrealistic detection performance but also poor reliability, posing a significant obstacle to real-world applications. These findings suggest that XAI approaches should be used to help practitioners/researchers better understand how do AI/ML models (i.e., malware detection) work-not just focusing on accuracy improvement.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 33rd International Symposium on Software Reliability Engineering, ISSRE 2022
EditorsNahgmeh Ivaki, Siwei Zhou
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages169-180
Number of pages12
ISBN (Electronic)9781665451321
ISBN (Print)9781665451338
DOIs
Publication statusPublished - 2022
EventInternational Symposium on Software Reliability Engineering 2022 - Charlotte, United States of America
Duration: 31 Oct 20213 Nov 2021
Conference number: 33rd
https://ieeexplore.ieee.org/xpl/conhome/9978763/proceeding (Proceedings)
https://issre2022.github.io/ (Website)

Publication series

NameProceedings - International Symposium on Software Reliability Engineering, ISSRE
PublisherIEEE, Institute of Electrical and Electronics Engineers
Volume2022-October
ISSN (Print)1071-9458
ISSN (Electronic)2332-6549

Conference

ConferenceInternational Symposium on Software Reliability Engineering 2022
Abbreviated titleISSRE 2022
Country/TerritoryUnited States of America
CityCharlotte
Period31/10/213/11/21
Internet address

Keywords

  • Android malware
  • Explainable AI
  • Machine learning

Cite this