Nearest neighbor sampling for cross company defect predictors (Abstract only)

Burak Turhan, Ayşe Bener, Tim Menzies

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

8 Citations (Scopus)


Several research in defect prediction focus on building models with available local data (i.e. within company predictors). To employ these models, a company should have a data repository, where project metrics and defect information from past projects are stored. However, few companies apply this practice. In a recent work, we have shown that cross company data can be used for building predictors with the cost of increased false alarms. Thus, we argued that the practical application of cross-company predictors is limited to mission critical projects and companies should starve for local data. In this paper, we show that nearest neighbor (NN) sampling of cross-company data removes the increased false alarm rates. We conclude that cross company defect predictors can be practical tools with NN sampling, yet local predictors are still the best and companies should keep starving for local data.

Original languageEnglish
Title of host publication2008 International Symposium on Software Testing and Analysis - Proceedings of the 2008 Workshop on Defects in Large Software Systems 2008, DEFECTS'08
Number of pages1
Publication statusPublished - 15 Dec 2008
Externally publishedYes
Event2008 Workshop on Defects in Large Software Systems 2008, DEFECTS'08 - Seattle, WA, United States of America
Duration: 20 Jul 200820 Jul 2008


Conference2008 Workshop on Defects in Large Software Systems 2008, DEFECTS'08
CountryUnited States of America
CitySeattle, WA

Cite this