TY - JOUR
T1 - Characterizing malicious Android apps by mining topic-specific data flow signatures
AU - Yang, Xinli
AU - Lo, David
AU - Li, Li
AU - Xia, Xin
AU - Bissyandé, Tegawendé F.
AU - Klein, Jacques
PY - 2017/10/1
Y1 - 2017/10/1
N2 - Context: State-of-the-art works on automated detection of Android malware have leveraged app descriptions to spot anomalies w.r.t the functionality implemented, or have used data flow information as a feature to discriminate malicious from benign apps. Although these works have yielded promising performance, we hypothesize that these performances can be improved by a better understanding of malicious behavior. Objective: To characterize malicious apps, we take into account both information on app descriptions, which are indicative of apps’ topics, and information on sensitive data flow, which can be relevant to discriminate malware from benign apps. Method: In this paper, we propose a topic-specific approach to malware comprehension based on app descriptions and data-flow information. First, we use an advanced topic model, adaptive LDA with GA, to cluster apps according to their descriptions. Then, we use information gain ratio of sensitive data flow information to build so-called “topic-specific data flow signatures”. Results: We conduct an empirical study on 3691 benign and 1612 malicious apps. We group them into 118 topics and generate topic-specific data flow signature. We verify the effectiveness of the topic-specific data flow signatures by comparing them with the overall data flow signature. In addition, we perform a deeper analysis on 25 representative topic-specific signatures and yield several implications. Conclusion: Topic-specific data flow signatures are efficient in highlighting the malicious behavior, and thus can help in characterizing malware.
AB - Context: State-of-the-art works on automated detection of Android malware have leveraged app descriptions to spot anomalies w.r.t the functionality implemented, or have used data flow information as a feature to discriminate malicious from benign apps. Although these works have yielded promising performance, we hypothesize that these performances can be improved by a better understanding of malicious behavior. Objective: To characterize malicious apps, we take into account both information on app descriptions, which are indicative of apps’ topics, and information on sensitive data flow, which can be relevant to discriminate malware from benign apps. Method: In this paper, we propose a topic-specific approach to malware comprehension based on app descriptions and data-flow information. First, we use an advanced topic model, adaptive LDA with GA, to cluster apps according to their descriptions. Then, we use information gain ratio of sensitive data flow information to build so-called “topic-specific data flow signatures”. Results: We conduct an empirical study on 3691 benign and 1612 malicious apps. We group them into 118 topics and generate topic-specific data flow signature. We verify the effectiveness of the topic-specific data flow signatures by comparing them with the overall data flow signature. In addition, we perform a deeper analysis on 25 representative topic-specific signatures and yield several implications. Conclusion: Topic-specific data flow signatures are efficient in highlighting the malicious behavior, and thus can help in characterizing malware.
KW - Data flow signature
KW - Empirical study
KW - Malware characterization
KW - Topic-specific
UR - http://www.scopus.com/inward/record.url?scp=85018397382&partnerID=8YFLogxK
U2 - 10.1016/j.infsof.2017.04.007
DO - 10.1016/j.infsof.2017.04.007
M3 - Article
AN - SCOPUS:85018397382
SN - 0950-5849
VL - 90
SP - 27
EP - 39
JO - Information and Software Technology
JF - Information and Software Technology
ER -