Abstract
In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resource, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high impact bugs are used to refer to the bugs which appear in unexpected time or locations and bring more unexpected effects, or break pre-existing functionalities and destroy the user experience. Unfortunately, identifying high impact bugs from the thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high impact bugs, the identification of high impact bug reports is a difficult task. In this paper, we propose an approach to identify high impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various imbalanced learning strategies built upon a number of well-known classification algorithms. In particular, we choose four widely used strategies for dealing with imbalanced data and use naive Bayes multinominal as the classification algorithm to conduct experiments on four datasets from four different open source projects. We perform an empirical study on a specific type of high impact bugs, i.e., surprise bugs, which were first studied by Shihab et al. The results show that under-sampling is the best imbalanced learning strategy with naive Bayes multinominal for high impact bug identification.
Original language | English |
---|---|
Title of host publication | Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016 |
Editors | Sorel Reisman, Sheikh Iqbal Ahamed, Ling Liu, Dejan Milojicic, William Claycomb, Mihhail Matskin, Hiroyuki Sato, Zhiyong Zhang |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 227-232 |
Number of pages | 6 |
ISBN (Electronic) | 9781467388450 |
ISBN (Print) | 9781467388467 |
DOIs | |
Publication status | Published - 2016 |
Externally published | Yes |
Event | International Computer Software and Applications Conference 2016 - Atlanta, United States of America Duration: 10 Jun 2016 → 14 Jun 2016 Conference number: 40th https://www.computer.org/web/compsac2016 https://ieeexplore.ieee.org/xpl/conhome/7551592/proceeding (Proceedings) |
Conference
Conference | International Computer Software and Applications Conference 2016 |
---|---|
Abbreviated title | COMPSAC 2016 |
Country/Territory | United States of America |
City | Atlanta |
Period | 10/06/16 → 14/06/16 |
Internet address |
Keywords
- High Impact Bug
- Imbalanced Data
- Text Classification