DistXplore: Distribution-guided testing for evaluating and enhancing deep learning systems

Longtian Wang, Xiaofei Xie, Xiaoning Du, Meng Tian, Qing Guo, Zheng Yang, Chao Shen

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)

Abstract

Deep learning (DL) models are trained on sampled data, where the distribution of training data differs from that of real-world data (i.e., the distribution shift), which reduces the model's robustness. Various testing techniques have been proposed, including distribution-unaware and distribution-aware methods. However, distribution-unaware testing lacks effectiveness by not explicitly considering the distribution of test cases and may generate redundant errors (within same distribution). Distribution-aware testing techniques primarily focus on generating test cases that follow the training distribution, missing out-of-distribution data that may also be valid and should be considered in the testing process. In this paper, we propose a novel distribution-guided approach for generating valid test cases with diverse distributions, which can better evaluate the model's robustness (i.e., generating hard-to-detect errors) and enhance the model's robustness (i.e., enriching training data). Unlike existing testing techniques that optimize individual test cases, DistXplore optimizes test suites that represent specific distributions. To evaluate and enhance the model's robustness, we design two metrics: distribution difference, which maximizes the similarity in distribution between two different classes of data to generate hard-to-detect errors, and distribution diversity, which increase the distribution diversity of generated test cases for enhancing the model's robustness. To evaluate the effectiveness of DistXplore in model evaluation and enhancement, we compare DistXplore with 14 state-of-the-art baselines on 10 models across 4 datasets. The evaluation results show that DisXplore not only detects a larger number of errors (e.g., 2×+ on average). Furthermore, DistXplore achieves a higher improvement in empirical robustness (e.g., 5.2% more accuracy improvement than the baselines on average).

Original languageEnglish
Title of host publicationESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsSatish Chandra, Kelly Blincoe, Paolo Tonella
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages68-80
Number of pages13
ISBN (Electronic)9798400703270
DOIs
Publication statusPublished - 30 Nov 2023
EventJoint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2023 - San Francisco, United States of America
Duration: 3 Dec 20239 Dec 2023
Conference number: 31st
https://dl.acm.org/doi/proceedings/10.1145/3611643 (Proceedings)
https://conf.researchr.org/home/fse-2023 (Website)

Conference

ConferenceJoint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering 2023
Abbreviated titleESEC/FSE 2023
Country/TerritoryUnited States of America
CitySan Francisco
Period3/12/239/12/23
Internet address

Keywords

  • Deep learning
  • distribution diversity
  • model enhancement
  • software testing

Cite this