Abstract
Naive Bayes is a classical machine learning algorithm for which discretization is commonly used to transform quantitative attributes into qualitative attributes. Of numerous discretization methods, Non-Disjoint Discretization (NDD) proposes a novel perspective by forming overlapping intervals and always locating a value toward the middle of an interval. However, existing approaches to NDD fail to adequately consider the effect of multiple occurrences of a single value — a commonly occurring circumstance in practice. By necessity, all occurrences of a single value fall within the same interval. As a result, it is often not possible to discretize an attribute into intervals containing equal numbers of training instances. Current methods address this issue in an ad hoc manner, reducing the specificity of the resulting atomic intervals. In this study, we propose a non-disjoint discretization method for NB, called Rigorous Non-Disjoint Discretization (RNDD), that handles multiple occurrences of a single value in a systematic manner. Our extensive experimental results suggest that RNDD significantly outperforms NDD along with all other existing state-of-the-art competitors.
| Original language | English |
|---|---|
| Article number | 109554 |
| Number of pages | 12 |
| Journal | Pattern Recognition |
| Volume | 140 |
| DOIs | |
| Publication status | Published - Aug 2023 |
Keywords
- Discretization
- Naive Bayes
- Proportional weighting
- Singleton interval
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver