Skip to main navigation Skip to search Skip to main content

Cross-project build co-change prediction

  • Xin Xia
  • , David Lo
  • , Shane Mcintosh
  • , Emad Shihab
  • , Ahmed E. Hassan

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction. 

In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.

Original languageEnglish
Title of host publication2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) - Proceedings
Subtitle of host publicationMarch 2-6, 2015 Montréal, Canada
EditorsYann-Gaël Guéhéneuc, Bram Adams, Alexander Serebrenik
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages311-320
Number of pages10
ISBN (Electronic)9781479984695
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventIEEE International Conference on Software Analysis, Evolution, and Reengineering 2015 - Montreal, Canada
Duration: 2 Mar 20156 Mar 2015
Conference number: 22nd
http://www.saner.polymtl.ca/doku.php?id=en:start
https://ieeexplore.ieee.org/xpl/conhome/7066219/proceeding (Proceedings)

Conference

ConferenceIEEE International Conference on Software Analysis, Evolution, and Reengineering 2015
Abbreviated titleSANER 2015
Country/TerritoryCanada
CityMontreal
Period2/03/156/03/15
Internet address

Keywords

  • Build Co-change Prediction
  • Cross-project
  • Imbalance Data
  • Transfer Learning

Cite this