Backdoor Attack on Machine Learning Based Android Malware Detectors

Chaoran Li, Xiao Chen, Derui Wang, Sheng Wen, Muhammad Ahmed, Seyit Camtepe, Yang Xiang

Research output: Contribution to journalArticleResearchpeer-review

14 Citations (Scopus)

Abstract

Machine learning (ML) has been widely used for malware detection on different operating systems, including Android. To keep up with malware's evolution, the detection models usually need to be retrained periodically (e.g., every month) based on the data collected in the wild. However, this leads to poisoning attacks, specifically backdoor attacks, which subvert the learning process and create evasion 'tunnels' for manipulated malware samples. To date, we have not found any prior research that explored this critical problem in Android malware detectors. Although there are already some similar works in the image classification field, most of those similar ideas cannot be borrowed to solve this problem, because the assumption that the attacker has full control of the training data collection or labelling process is not realistic in real-world malware detection scenarios. In this article, we are motivated to study the backdoor attack against Android malware detectors. The backdoor is created and injected into the model stealthily without access to the training data and activated when an app with the trigger is presented. We demonstrate the proposed attack on four typical malware detectors that have been widely discussed in academia. Our evaluation shows that the proposed backdoor attack achieves up to 99 percent evasion rate over 750 malware samples. Moreover, the above successful attack is realised by a small size of triggers (only four features) and a very low data poisoning rate (0.3 percent).

Original languageEnglish
Pages (from-to)3357-3370
Number of pages14
JournalIEEE Transactions on Dependable and Secure Computing
Volume19
Issue number5
DOIs
Publication statusPublished - 2022

Keywords

  • backdoor attack
  • computer security
  • data poisoning
  • machine learning
  • Malware detection

Cite this