Improving defect prediction with deep forest

Tianchi Zhou, Xiaobing Sun, Xin Xia, Bin Li, Xiang Chen

Research output: Contribution to journalArticleResearchpeer-review

82 Citations (Scopus)


Context: Software defect prediction is important to ensure the quality of software. Nowadays, many supervised learning techniques have been applied to identify defective instances (e.g., methods, classes, and modules). Objective: However, the performance of these supervised learning techniques are still far from satisfactory, and it will be important to design more advanced techniques to improve the performance of defect prediction models. Method: We propose a new deep forest model to build the defect prediction model (DPDF). This model can identify more important defect features by using a new cascade strategy, which transforms random forest classifiers into a layer-by-layer structure. This design takes full advantage of ensemble learning and deep learning. Results: We evaluate our approach on 25 open source projects from four public datasets (i.e., NASA, PROMISE, AEEEM and Relink). Experimental results show that our approach increases AUC value by 5% compared with the best traditional machine learning algorithms. Conclusion: The deep strategy in DPDF is effective for software defect prediction.

Original languageEnglish
Pages (from-to)204-216
Number of pages13
JournalInformation and Software Technology
Publication statusPublished - Oct 2019


  • Cascade strategy
  • Deep forest
  • Empirical evaluation
  • Software defect prediction

Cite this