STADS

Software testing as species discovery

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run the fuzzer for a week has currently no means (1) to estimate the total number of feasible program branches, given that only a fraction has been covered so far; (2) to estimate the additional time required to cover 10% more branches (or to estimate the coverage achieved in one more day, respectively); or (3) to assess the residual risk that a vulnerability exists when no vulnerability has been discovered. Failing to discover a vulnerability does not mean that none exists-even if the fuzzer was run for a week (or a year). Hence, testing provides no formal correctness guarantees. In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimations (1) of the total number of species, (2) of the additional sampling effort required to discover 10% more species, or (3) of the probability to discover a new species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias-AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.

Original languageEnglish
Article number7
Number of pages52
JournalACM Transactions on Software Engineering and Methodology
Volume27
Issue number2
DOIs
Publication statusPublished - 1 Jul 2018

Keywords

  • Code coverage
  • Discovery probability
  • Extrapolation
  • Fuzzing
  • Measure of confidence
  • Measure of progress
  • Reliability
  • Security
  • Species coverage
  • Statistical guarantees
  • Stopping rule

Cite this

@article{70b96bb3f7804f4d9222dcfe29da5626,
title = "STADS: Software testing as species discovery",
abstract = "A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run the fuzzer for a week has currently no means (1) to estimate the total number of feasible program branches, given that only a fraction has been covered so far; (2) to estimate the additional time required to cover 10{\%} more branches (or to estimate the coverage achieved in one more day, respectively); or (3) to assess the residual risk that a vulnerability exists when no vulnerability has been discovered. Failing to discover a vulnerability does not mean that none exists-even if the fuzzer was run for a week (or a year). Hence, testing provides no formal correctness guarantees. In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimations (1) of the total number of species, (2) of the additional sampling effort required to discover 10{\%} more species, or (3) of the probability to discover a new species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias-AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.",
keywords = "Code coverage, Discovery probability, Extrapolation, Fuzzing, Measure of confidence, Measure of progress, Reliability, Security, Species coverage, Statistical guarantees, Stopping rule",
author = "Marcel B{\"o}hme",
year = "2018",
month = "7",
day = "1",
doi = "10.1145/3210309",
language = "English",
volume = "27",
journal = "ACM Transactions on Software Engineering and Methodology",
issn = "1049-331X",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

STADS : Software testing as species discovery. / Böhme, Marcel.

In: ACM Transactions on Software Engineering and Methodology, Vol. 27, No. 2, 7, 01.07.2018.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - STADS

T2 - Software testing as species discovery

AU - Böhme, Marcel

PY - 2018/7/1

Y1 - 2018/7/1

N2 - A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run the fuzzer for a week has currently no means (1) to estimate the total number of feasible program branches, given that only a fraction has been covered so far; (2) to estimate the additional time required to cover 10% more branches (or to estimate the coverage achieved in one more day, respectively); or (3) to assess the residual risk that a vulnerability exists when no vulnerability has been discovered. Failing to discover a vulnerability does not mean that none exists-even if the fuzzer was run for a week (or a year). Hence, testing provides no formal correctness guarantees. In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimations (1) of the total number of species, (2) of the additional sampling effort required to discover 10% more species, or (3) of the probability to discover a new species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias-AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.

AB - A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run the fuzzer for a week has currently no means (1) to estimate the total number of feasible program branches, given that only a fraction has been covered so far; (2) to estimate the additional time required to cover 10% more branches (or to estimate the coverage achieved in one more day, respectively); or (3) to assess the residual risk that a vulnerability exists when no vulnerability has been discovered. Failing to discover a vulnerability does not mean that none exists-even if the fuzzer was run for a week (or a year). Hence, testing provides no formal correctness guarantees. In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimations (1) of the total number of species, (2) of the additional sampling effort required to discover 10% more species, or (3) of the probability to discover a new species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias-AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.

KW - Code coverage

KW - Discovery probability

KW - Extrapolation

KW - Fuzzing

KW - Measure of confidence

KW - Measure of progress

KW - Reliability

KW - Security

KW - Species coverage

KW - Statistical guarantees

KW - Stopping rule

UR - http://www.scopus.com/inward/record.url?scp=85053872377&partnerID=8YFLogxK

U2 - 10.1145/3210309

DO - 10.1145/3210309

M3 - Article

VL - 27

JO - ACM Transactions on Software Engineering and Methodology

JF - ACM Transactions on Software Engineering and Methodology

SN - 1049-331X

IS - 2

M1 - 7

ER -