Abstract
This paper investigates the effect of class distribution on the predictive performance of classification models using cost-sensitive learning, rather than the sampling approach employed previously by a similar study. The predictive performance is measured using the cost space representation, which is a dual to the ROC representation. This study shows that distributions which range between the natural distribution and the balanced distribution can also produce the best models, contrary to the finding of the previous study. In addition, we find that the best models are larger in size than those trained using the natural distribution. We also show two different ways to achieve the same effect of the corrected probability estimates proposed by the previous study.
| Original language | English |
|---|---|
| Title of host publication | Discovery Science |
| Subtitle of host publication | 5th International Conference, DS 2002 Lubeck, Germany, November 24-26, 2002 Proceedings |
| Place of Publication | Berlin Germany |
| Publisher | Springer |
| Pages | 98-112 |
| Number of pages | 15 |
| ISBN (Print) | 3540001883 |
| DOIs | |
| Publication status | Published - 2002 |
| Event | International Conference on Discovery Science 2002 - Lubeck, Germany Duration: 24 Nov 2002 → 26 Nov 2002 Conference number: 5th |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
| Volume | 2534 |
| ISSN (Print) | 0302-9743 |
Conference
| Conference | International Conference on Discovery Science 2002 |
|---|---|
| Abbreviated title | DS 2002 |
| Country/Territory | Germany |
| City | Lubeck |
| Period | 24/11/02 → 26/11/02 |