Boosting fuzzer efficiency: an information theoretic perspective

Marcel Böhme, Valentin J.M. Manès, Sang Kil Cha

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)

Abstract

In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P: What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals "some amount"of information about P's behaviors. A classic measure of information is Shannon's entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information. From this information theoretic perspective, we develop Entropic, an entropy-based power schedule for greybox fuzzing which assigns more energy to seeds that maximize information. We implemented Entropic into the popular greybox fuzzer LibFuzzer. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. Entropic has been independently evaluated and invited for integration into main-line LibFuzzer. Entropic now runs on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.

Original languageEnglish
Title of host publicationProceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsPrem Devanbu, Myra Cohen, Thomas Zimmermann
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages678-689
Number of pages12
ISBN (Electronic)9781450370431
DOIs
Publication statusPublished - 2020
EventJoint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2020 - Virtual, United States of America
Duration: 8 Nov 202013 Nov 2020
Conference number: 28th
https://dl.acm.org/doi/proceedings/10.1145/3368089 (Proceedings)
https://2020.esec-fse.org (Website)

Conference

ConferenceJoint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2020
Abbreviated titleESEC/FSE 2020
CountryUnited States of America
CityVirtual
Period8/11/2013/11/20
Internet address

Keywords

  • Efficiency
  • Entropy
  • Fuzzing
  • Information theory
  • Software testing

Cite this