Use and misuse of the term experiment in Mining Software Repositories Research

Claudia Ayala, Burak Turhan, Xavier Franch, Natalia Juristo

Research output: Contribution to journalArticleResearchpeer-review

6 Citations (Scopus)


The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic mapping study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments.

Original languageEnglish
Pages (from-to)4229-4248
Number of pages19
JournalIEEE Transactions on Software Engineering
Issue number11
Publication statusPublished - 1 Nov 2022


  • controlled experiment
  • Data mining
  • Empirical Software Engineering
  • mining software repositories
  • research methodology
  • Resource management
  • Software
  • Software engineering
  • Systematics
  • Terminology
  • Tools

Cite this