Skip to main navigation Skip to search Skip to main content

Revisiting the Evaluation of Deep Learning-Based Compiler Testing

  • Yongqiang Tian
  • , Zhenyang Xu
  • , Yiwen Dong
  • , Chengnian Sun
  • , Shing-Chi Cheung

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

A high-quality program generator is essential to effective automated compiler testing. Engineering such a program generator is difficult, time-consuming, and specific to the language under testing, thus requiring tremendous efforts from human experts with language-specific domain knowledge. To avoid repeatedly writing program generators for different languages, researchers recently proposed a language-agnostic approach based on deep learning techniques to automatically learn a program generator (referred to as DLG) from existing programs. Evaluations show that DLGs outperform Language-Specific Program Generators (LSGs) in testing compilers. However, we argue that it is unfair to use LSGs as baselines to evaluate DLGs. LSGs aim to validate compiler optimizations by only generating compilable, well-defined test programs; this restriction inevitably impairs the diversity of the language features used in the generated programs. In contrast, DLGs do not aim to validate the correctness of compiler optimizations, and its generated programs are not guaranteed to be well-defined or even compilable. Therefore, it is not surprising that DLG-generated programs are more diverse in terms of used language features than LSG-generated ones. This study revisits the evaluation of DLGs, and proposes a new, fair, simple yet strong baseline named Kitten for evaluating DLGs. Given a dataset consisting of human-written programs, instead of using deep learning techniques to learn a program generator, Kitten directly derives new programs by mutating the programs in the dataset. Extensive experiments with more than 1,500 CPU-hours demonstrate that the state-of-the-art DLGs fail to compete against such a simple baseline: 3 v.s. 1,750 hang bugs, 1 v.s. 34 distinct compiler crashes. We believe that DLGs still have a large room for improvement.

Original languageEnglish
Title of host publicationProceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
EditorsEdith Elkind
Place of PublicationMarina del Rey CA USA
PublisherAssociation for the Advancement of Artificial Intelligence (AAAI)
Pages4873-4882
Number of pages10
ISBN (Electronic)9781956792034
DOIs
Publication statusPublished - 2023
Externally publishedYes
EventInternational Joint Conference on Artificial Intelligence 2023 - Macao, China
Duration: 19 Aug 202325 Aug 2023
Conference number: 32nd
https://www.ijcai.org/proceedings/2023/ (Proceedings)
https://ijcai-23.org/ (Website)

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
PublisherAssociation for the Advancement of Artificial Intelligence (AAAI)
Volume2023-August
ISSN (Print)1045-0823

Conference

ConferenceInternational Joint Conference on Artificial Intelligence 2023
Abbreviated titleIJCAI 2023
Country/TerritoryChina
CityMacao
Period19/08/2325/08/23
Internet address

Keywords

  • Multidisciplinary Topics and Applications
  • MDA
  • Software engineering

Cite this