Combining static analysis with probabilistic models to enable market-scale android inter-component analysis

Damien Octeau, Somesh Jha, Matthew Dering, Patrick McDaniel, Alexandre Bartel, Li Li, Jacques Klein, Yves Le Traon

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

24 Citations (Scopus)

Abstract

Static analysis has been successfully used in many areas, from verifying mission-critical software to malware detection. Unfortunately, static analysis often produces false positives, which require significant manual effort to resolve. In this paper, we show how to overlay a probabilistic model, trained using domain knowledge, on top of static analysis results, in order to triage static analysis results. We apply this idea to analyzing mobile applications. Android application components can communicate with each other, both within single applications and between different applications. Unfortunately, techniques to statically infer Inter-Component Communication (ICC) yield many potential inter-component and interapplication links, most of which are false positives. At large scales, scrutinizing all potential links is simply not feasible. We therefore overlay a probabilistic model of ICC on top of static analysis results. Since computing the inter-component links is a prerequisite to inter-component analysis, we introduce a formalism for inferring ICC links based on set constraints.We design an efficient algorithm for performing link resolution. We compute all potential links in a corpus of 11, 267 applications in 30 minutes and triage them using our probabilistic approach. We find that over 95.1% of all 636 million potential links are associated with probability values below 0.01 and are thus likely unfeasible links. Thus, it is possible to consider only a small subset of all links without significant loss of information. This work is the first significant step in making static inter-application analysis more tractable, even at large scales.

Original languageEnglish
Title of host publicationPOPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Subtitle of host publicationJanuary 20–22, 2016 St. Petersburg, FL, USA
EditorsRastislav Bodik, Rupak Majumdar
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages469-484
Number of pages16
ISBN (Electronic)9781450335492
DOIs
Publication statusPublished - 2016
Externally publishedYes
EventAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 2016 - St. Petersburg, United States of America
Duration: 20 Jan 201622 Jan 2016
Conference number: 43rd
https://popl16.sigplan.org/

Conference

ConferenceAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 2016
Abbreviated titlePOPL 2016
CountryUnited States of America
CitySt. Petersburg
Period20/01/1622/01/16
Internet address

Keywords

  • Android ICC
  • Inter-component communication
  • Probabilistic program analysis
  • Static analysis

Cite this

Octeau, D., Jha, S., Dering, M., McDaniel, P., Bartel, A., Li, L., ... Traon, Y. L. (2016). Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In R. Bodik, & R. Majumdar (Eds.), POPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages: January 20–22, 2016 St. Petersburg, FL, USA (pp. 469-484). New York NY USA: Association for Computing Machinery (ACM). https://doi.org/10.1145/2837614.2837661
Octeau, Damien ; Jha, Somesh ; Dering, Matthew ; McDaniel, Patrick ; Bartel, Alexandre ; Li, Li ; Klein, Jacques ; Traon, Yves Le. / Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. POPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages: January 20–22, 2016 St. Petersburg, FL, USA. editor / Rastislav Bodik ; Rupak Majumdar. New York NY USA : Association for Computing Machinery (ACM), 2016. pp. 469-484
@inproceedings{fa6c4a9933de4eb79d3735496f758bf2,
title = "Combining static analysis with probabilistic models to enable market-scale android inter-component analysis",
abstract = "Static analysis has been successfully used in many areas, from verifying mission-critical software to malware detection. Unfortunately, static analysis often produces false positives, which require significant manual effort to resolve. In this paper, we show how to overlay a probabilistic model, trained using domain knowledge, on top of static analysis results, in order to triage static analysis results. We apply this idea to analyzing mobile applications. Android application components can communicate with each other, both within single applications and between different applications. Unfortunately, techniques to statically infer Inter-Component Communication (ICC) yield many potential inter-component and interapplication links, most of which are false positives. At large scales, scrutinizing all potential links is simply not feasible. We therefore overlay a probabilistic model of ICC on top of static analysis results. Since computing the inter-component links is a prerequisite to inter-component analysis, we introduce a formalism for inferring ICC links based on set constraints.We design an efficient algorithm for performing link resolution. We compute all potential links in a corpus of 11, 267 applications in 30 minutes and triage them using our probabilistic approach. We find that over 95.1{\%} of all 636 million potential links are associated with probability values below 0.01 and are thus likely unfeasible links. Thus, it is possible to consider only a small subset of all links without significant loss of information. This work is the first significant step in making static inter-application analysis more tractable, even at large scales.",
keywords = "Android ICC, Inter-component communication, Probabilistic program analysis, Static analysis",
author = "Damien Octeau and Somesh Jha and Matthew Dering and Patrick McDaniel and Alexandre Bartel and Li Li and Jacques Klein and Traon, {Yves Le}",
year = "2016",
doi = "10.1145/2837614.2837661",
language = "English",
pages = "469--484",
editor = "Rastislav Bodik and Rupak Majumdar",
booktitle = "POPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages",
publisher = "Association for Computing Machinery (ACM)",
address = "United States of America",

}

Octeau, D, Jha, S, Dering, M, McDaniel, P, Bartel, A, Li, L, Klein, J & Traon, YL 2016, Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. in R Bodik & R Majumdar (eds), POPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages: January 20–22, 2016 St. Petersburg, FL, USA. Association for Computing Machinery (ACM), New York NY USA, pp. 469-484, Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages 2016, St. Petersburg, United States of America, 20/01/16. https://doi.org/10.1145/2837614.2837661

Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. / Octeau, Damien; Jha, Somesh; Dering, Matthew; McDaniel, Patrick; Bartel, Alexandre; Li, Li; Klein, Jacques; Traon, Yves Le.

POPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages: January 20–22, 2016 St. Petersburg, FL, USA. ed. / Rastislav Bodik; Rupak Majumdar. New York NY USA : Association for Computing Machinery (ACM), 2016. p. 469-484.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Combining static analysis with probabilistic models to enable market-scale android inter-component analysis

AU - Octeau, Damien

AU - Jha, Somesh

AU - Dering, Matthew

AU - McDaniel, Patrick

AU - Bartel, Alexandre

AU - Li, Li

AU - Klein, Jacques

AU - Traon, Yves Le

PY - 2016

Y1 - 2016

N2 - Static analysis has been successfully used in many areas, from verifying mission-critical software to malware detection. Unfortunately, static analysis often produces false positives, which require significant manual effort to resolve. In this paper, we show how to overlay a probabilistic model, trained using domain knowledge, on top of static analysis results, in order to triage static analysis results. We apply this idea to analyzing mobile applications. Android application components can communicate with each other, both within single applications and between different applications. Unfortunately, techniques to statically infer Inter-Component Communication (ICC) yield many potential inter-component and interapplication links, most of which are false positives. At large scales, scrutinizing all potential links is simply not feasible. We therefore overlay a probabilistic model of ICC on top of static analysis results. Since computing the inter-component links is a prerequisite to inter-component analysis, we introduce a formalism for inferring ICC links based on set constraints.We design an efficient algorithm for performing link resolution. We compute all potential links in a corpus of 11, 267 applications in 30 minutes and triage them using our probabilistic approach. We find that over 95.1% of all 636 million potential links are associated with probability values below 0.01 and are thus likely unfeasible links. Thus, it is possible to consider only a small subset of all links without significant loss of information. This work is the first significant step in making static inter-application analysis more tractable, even at large scales.

AB - Static analysis has been successfully used in many areas, from verifying mission-critical software to malware detection. Unfortunately, static analysis often produces false positives, which require significant manual effort to resolve. In this paper, we show how to overlay a probabilistic model, trained using domain knowledge, on top of static analysis results, in order to triage static analysis results. We apply this idea to analyzing mobile applications. Android application components can communicate with each other, both within single applications and between different applications. Unfortunately, techniques to statically infer Inter-Component Communication (ICC) yield many potential inter-component and interapplication links, most of which are false positives. At large scales, scrutinizing all potential links is simply not feasible. We therefore overlay a probabilistic model of ICC on top of static analysis results. Since computing the inter-component links is a prerequisite to inter-component analysis, we introduce a formalism for inferring ICC links based on set constraints.We design an efficient algorithm for performing link resolution. We compute all potential links in a corpus of 11, 267 applications in 30 minutes and triage them using our probabilistic approach. We find that over 95.1% of all 636 million potential links are associated with probability values below 0.01 and are thus likely unfeasible links. Thus, it is possible to consider only a small subset of all links without significant loss of information. This work is the first significant step in making static inter-application analysis more tractable, even at large scales.

KW - Android ICC

KW - Inter-component communication

KW - Probabilistic program analysis

KW - Static analysis

UR - http://www.scopus.com/inward/record.url?scp=84962580740&partnerID=8YFLogxK

U2 - 10.1145/2837614.2837661

DO - 10.1145/2837614.2837661

M3 - Conference Paper

SP - 469

EP - 484

BT - POPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

A2 - Bodik, Rastislav

A2 - Majumdar, Rupak

PB - Association for Computing Machinery (ACM)

CY - New York NY USA

ER -

Octeau D, Jha S, Dering M, McDaniel P, Bartel A, Li L et al. Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In Bodik R, Majumdar R, editors, POPL'16 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages: January 20–22, 2016 St. Petersburg, FL, USA. New York NY USA: Association for Computing Machinery (ACM). 2016. p. 469-484 https://doi.org/10.1145/2837614.2837661