TY - JOUR
T1 - An eager splitting strategy for online decision trees in ensembles
AU - Manapragada, Chaitanya
AU - Gomes, Heitor M.
AU - Salehi, Mahsa
AU - Bifet, Albert
AU - Webb, Geoffrey I.
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature.
PY - 2022/3
Y1 - 2022/3
N2 - Decision tree ensembles are widely used in practice. In this work, we study in ensemble settings the effectiveness of replacing the split strategy for the state-of-the-art online tree learner, Hoeffding Tree, with a rigorous but more eager splitting strategy that we had previously published as Hoeffding AnyTime Tree. Hoeffding AnyTime Tree (HATT), uses the Hoeffding Test to determine whether the current best candidate split is superior to the current split, with the possibility of revision, while Hoeffding Tree aims to determine whether the top candidate is better than the second best and if a test is selected, fixes it for all posterity. HATT converges to the ideal batch tree while Hoeffding Tree does not. We find that HATT is an efficacious base learner for online bagging and online boosting ensembles. On UCI and synthetic streams, HATT as a base learner outperforms HT at a 0.05 significance level for the majority of tested ensembles on what we believe is the largest and most comprehensive set of testbenches in the online learning literature. Our results indicate that HATT is a superior alternative to Hoeffding Tree in a large number of ensemble settings.
AB - Decision tree ensembles are widely used in practice. In this work, we study in ensemble settings the effectiveness of replacing the split strategy for the state-of-the-art online tree learner, Hoeffding Tree, with a rigorous but more eager splitting strategy that we had previously published as Hoeffding AnyTime Tree. Hoeffding AnyTime Tree (HATT), uses the Hoeffding Test to determine whether the current best candidate split is superior to the current split, with the possibility of revision, while Hoeffding Tree aims to determine whether the top candidate is better than the second best and if a test is selected, fixes it for all posterity. HATT converges to the ideal batch tree while Hoeffding Tree does not. We find that HATT is an efficacious base learner for online bagging and online boosting ensembles. On UCI and synthetic streams, HATT as a base learner outperforms HT at a 0.05 significance level for the majority of tested ensembles on what we believe is the largest and most comprehensive set of testbenches in the online learning literature. Our results indicate that HATT is a superior alternative to Hoeffding Tree in a large number of ensemble settings.
KW - Concept drift
KW - Explainability
KW - Hoeffding Tree
UR - http://www.scopus.com/inward/record.url?scp=85122320481&partnerID=8YFLogxK
U2 - 10.1007/s10618-021-00816-x
DO - 10.1007/s10618-021-00816-x
M3 - Article
AN - SCOPUS:85122320481
SN - 1384-5810
VL - 36
SP - 566
EP - 619
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
ER -