A new perspective on data homogeneity in software cost estimation: A study in the embedded systems domain

Ayşe Bakir, Burak Turhan, Ayşe B. Bener

Research output: Contribution to journalArticleResearchpeer-review

24 Citations (Scopus)


Cost estimation and effort allocation are the key challenges for successful project planning and management in software development. Therefore, both industry and the research community have been working on various models and techniques to accurately predict the cost of projects. Recently, researchers have started debating whether the prediction performance depends on the structure of data rather than the models used. In this article, we focus on a new aspect of data homogeneity, "cross- versus within-application domain", and investigate what kind of training data should be used for software cost estimation in the embedded systems domain. In addition, we try to find out the effect of training dataset size on the prediction performance. Based on our empirical results, we conclude that it is better to use cross-domain data for embedded software cost estimation and the optimum training data size depends on the method used.

Original languageEnglish
Pages (from-to)57-80
Number of pages24
JournalSoftware Quality Journal
Issue number1
Publication statusPublished - 1 Jan 2009
Externally publishedYes


  • Application domain
  • Cost estimation
  • Data homogeneity
  • Embedded software
  • Machine learning

Cite this