Revisiting the impact of common libraries for android-related investigations

Li Li, Timothée Riom, Tegawendé F. Bissyandé, Haoyu Wang, Jacques Klein, Le Traon Yves

Research output: Contribution to journalArticleResearchpeer-review

Abstract

The packaging model of Android apps requires the entire code to be shipped into a single APK file in order to be installed and executed on a device. This model introduces noises to Android app analyses, e.g., detection of repackaged applications, malware classification, as not only the core developer code but also the other assistant code will be visited. Such assistant code is often contributed by common libraries that are used pervasively by all apps. Despite much effort has been put in our community to investigate Android libraries, the momentum of Android research has not yet produced a complete and reliable set of common libraries for supporting thorough analyses of Android apps. In this work, we hence leverage a dataset of about 1.5 million apps from Google Play to identify potential common libraries, including advertisement libraries, and their abstract representations. With several steps of refinements, we finally collect 1113 libraries supporting common functions and 240 libraries for advertisement. For each library, we also collected its various abstract representations that could be leveraged to find new usages, including obfuscated cases. Based on these datasets, we further empirically revisit three popular Android app analyses, namely (1) repackaged app detection, (2) machine learning-based malware detection, and (3) static code analysis, aiming at measuring the impact of common libraries on their analysing performance. Our experimental results demonstrate that common library can indeed impact the performance of Android app analysis approaches. Indeed, common libraries can introduce both false positive and false negative results to repackaged app detection approaches. The existence of common libraries in Android apps may also impact the performance of machine learning-based classifications as well as that of static code analysers. All in all, the aforementioned results suggest that it is essential to harvest a reliable list of common libraries and also important to pay special attention to them when conducting Android-related investigations.

Original languageEnglish
Pages (from-to)157-175
Number of pages19
JournalJournal of Systems and Software
Volume154
DOIs
Publication statusPublished - Aug 2019

Cite this

Li, Li ; Riom, Timothée ; Bissyandé, Tegawendé F. ; Wang, Haoyu ; Klein, Jacques ; Yves, Le Traon. / Revisiting the impact of common libraries for android-related investigations. In: Journal of Systems and Software. 2019 ; Vol. 154. pp. 157-175.
@article{eb01093fb4e24ff187037dc3e89530da,
title = "Revisiting the impact of common libraries for android-related investigations",
abstract = "The packaging model of Android apps requires the entire code to be shipped into a single APK file in order to be installed and executed on a device. This model introduces noises to Android app analyses, e.g., detection of repackaged applications, malware classification, as not only the core developer code but also the other assistant code will be visited. Such assistant code is often contributed by common libraries that are used pervasively by all apps. Despite much effort has been put in our community to investigate Android libraries, the momentum of Android research has not yet produced a complete and reliable set of common libraries for supporting thorough analyses of Android apps. In this work, we hence leverage a dataset of about 1.5 million apps from Google Play to identify potential common libraries, including advertisement libraries, and their abstract representations. With several steps of refinements, we finally collect 1113 libraries supporting common functions and 240 libraries for advertisement. For each library, we also collected its various abstract representations that could be leveraged to find new usages, including obfuscated cases. Based on these datasets, we further empirically revisit three popular Android app analyses, namely (1) repackaged app detection, (2) machine learning-based malware detection, and (3) static code analysis, aiming at measuring the impact of common libraries on their analysing performance. Our experimental results demonstrate that common library can indeed impact the performance of Android app analysis approaches. Indeed, common libraries can introduce both false positive and false negative results to repackaged app detection approaches. The existence of common libraries in Android apps may also impact the performance of machine learning-based classifications as well as that of static code analysers. All in all, the aforementioned results suggest that it is essential to harvest a reliable list of common libraries and also important to pay special attention to them when conducting Android-related investigations.",
author = "Li Li and Timoth{\'e}e Riom and Bissyand{\'e}, {Tegawend{\'e} F.} and Haoyu Wang and Jacques Klein and Yves, {Le Traon}",
year = "2019",
month = "8",
doi = "10.1016/j.jss.2019.04.065",
language = "English",
volume = "154",
pages = "157--175",
journal = "Journal of Systems and Software",
issn = "0164-1212",
publisher = "Elsevier",

}

Revisiting the impact of common libraries for android-related investigations. / Li, Li; Riom, Timothée; Bissyandé, Tegawendé F.; Wang, Haoyu; Klein, Jacques; Yves, Le Traon.

In: Journal of Systems and Software, Vol. 154, 08.2019, p. 157-175.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Revisiting the impact of common libraries for android-related investigations

AU - Li, Li

AU - Riom, Timothée

AU - Bissyandé, Tegawendé F.

AU - Wang, Haoyu

AU - Klein, Jacques

AU - Yves, Le Traon

PY - 2019/8

Y1 - 2019/8

N2 - The packaging model of Android apps requires the entire code to be shipped into a single APK file in order to be installed and executed on a device. This model introduces noises to Android app analyses, e.g., detection of repackaged applications, malware classification, as not only the core developer code but also the other assistant code will be visited. Such assistant code is often contributed by common libraries that are used pervasively by all apps. Despite much effort has been put in our community to investigate Android libraries, the momentum of Android research has not yet produced a complete and reliable set of common libraries for supporting thorough analyses of Android apps. In this work, we hence leverage a dataset of about 1.5 million apps from Google Play to identify potential common libraries, including advertisement libraries, and their abstract representations. With several steps of refinements, we finally collect 1113 libraries supporting common functions and 240 libraries for advertisement. For each library, we also collected its various abstract representations that could be leveraged to find new usages, including obfuscated cases. Based on these datasets, we further empirically revisit three popular Android app analyses, namely (1) repackaged app detection, (2) machine learning-based malware detection, and (3) static code analysis, aiming at measuring the impact of common libraries on their analysing performance. Our experimental results demonstrate that common library can indeed impact the performance of Android app analysis approaches. Indeed, common libraries can introduce both false positive and false negative results to repackaged app detection approaches. The existence of common libraries in Android apps may also impact the performance of machine learning-based classifications as well as that of static code analysers. All in all, the aforementioned results suggest that it is essential to harvest a reliable list of common libraries and also important to pay special attention to them when conducting Android-related investigations.

AB - The packaging model of Android apps requires the entire code to be shipped into a single APK file in order to be installed and executed on a device. This model introduces noises to Android app analyses, e.g., detection of repackaged applications, malware classification, as not only the core developer code but also the other assistant code will be visited. Such assistant code is often contributed by common libraries that are used pervasively by all apps. Despite much effort has been put in our community to investigate Android libraries, the momentum of Android research has not yet produced a complete and reliable set of common libraries for supporting thorough analyses of Android apps. In this work, we hence leverage a dataset of about 1.5 million apps from Google Play to identify potential common libraries, including advertisement libraries, and their abstract representations. With several steps of refinements, we finally collect 1113 libraries supporting common functions and 240 libraries for advertisement. For each library, we also collected its various abstract representations that could be leveraged to find new usages, including obfuscated cases. Based on these datasets, we further empirically revisit three popular Android app analyses, namely (1) repackaged app detection, (2) machine learning-based malware detection, and (3) static code analysis, aiming at measuring the impact of common libraries on their analysing performance. Our experimental results demonstrate that common library can indeed impact the performance of Android app analysis approaches. Indeed, common libraries can introduce both false positive and false negative results to repackaged app detection approaches. The existence of common libraries in Android apps may also impact the performance of machine learning-based classifications as well as that of static code analysers. All in all, the aforementioned results suggest that it is essential to harvest a reliable list of common libraries and also important to pay special attention to them when conducting Android-related investigations.

UR - http://www.scopus.com/inward/record.url?scp=85064915523&partnerID=8YFLogxK

U2 - 10.1016/j.jss.2019.04.065

DO - 10.1016/j.jss.2019.04.065

M3 - Article

VL - 154

SP - 157

EP - 175

JO - Journal of Systems and Software

JF - Journal of Systems and Software

SN - 0164-1212

ER -