Abstract
Random forest based Learning-to-rank (LtR) algorithms exhibit competitive performance to other state-of-the-art algorithms. Traditionally, each tree of the forest is learnt from a bootstrapped copy (sampled with replacement) of the training set, where approximately 63% examples are unique, although some studies show that sampling without replacement also works well. The goal of using a bootstrapped copy instead of the original training set is to reduce correlation among individual trees, thereby making the prediction of the ensemble more accurate. In this study, we investigate whether we can decrease the correlation of the trees even more without compromising accuracy. Among several potential options, we work with the sub-sample used for learning individual trees. We investigate the performance of a random forest based LtR algorithm as we reduce the size of the sub-samples used for learning individual trees. Experiments on Letor data sets reveal that substantial reduction of training time can be achieved using only small amount of data training data. Not only that, the accuracy is likely to increase while maintaining the same level of performance stability as the baseline. Thus in addition to the existing benefit of being completely parallelizable, this study empirically discovers yet another ingredient of random forest based LtR algorithms for making them one of the top contenders for large scale LtR.
Original language | English |
---|---|
Title of host publication | Data Mining and Analytics 2014 |
Subtitle of host publication | Proceedings of the Twelfth Australasian Data Mining Conference (AusDM’14), Brisbane, Australia, 27-28 November 2014 |
Editors | Xue Li, Lin Liu, Kok-Leong Ong, Yanchang Zhao |
Place of Publication | New South Wales, Australia |
Publisher | Australian Computer Society Inc |
Pages | 91-99 |
Number of pages | 9 |
ISBN (Electronic) | 9781921770173 |
Publication status | Published - 2014 |
Event | Australasian Data Mining Conference 2014 - Queensland University of Technology, Brisbane, Australia Duration: 27 Nov 2014 → 28 Nov 2014 Conference number: 12th https://dblp.org/db/conf/ausdm/ausdm2014.html (Proceedings) |
Publication series
Name | Conferences in Research and Practice in Information Technology (CRPIT) |
---|---|
Publisher | Australian Computer Society Inc. |
Volume | 158 |
ISSN (Print) | 1445-1336 |
Conference
Conference | Australasian Data Mining Conference 2014 |
---|---|
Abbreviated title | AusDM 2014 |
Country/Territory | Australia |
City | Brisbane |
Period | 27/11/14 → 28/11/14 |
Internet address |
|
Keywords
- Bootstrapping
- Correlation
- Learning-to-rank
- Random forest
- Scalability
- Sub-sampling