Abstract
Many real-world data streams are non-stationary. Subject to concept drift, the distributions change over time. To retain accuracy in the face of such drift, online decision tree learners must discard parts of the tree that are no longer accurate and replace them by new subtrees that reflect the new distribution. The longstanding state-of-the-art online decision tree learner for non-stationary streams is Hoeffding Adaptive Tree (HAT), which adds a drift detection and response mechanism to the classic Very Fast Decision Tree (VFDT) online decision tree learner. However, for stationary distributions, VFDT has been superseded by Extremely Fast Decision Tree (EFDT), which uses a statistically more efficient learning mechanism than VFDT. This learning mechanism needs to be coupled with a compensatory revision mechanism that can compensate for circumstances where the learning mechanism is too eager. The current work develops a strategy to combine the best of both these state-of-the-art approaches, exploiting both the statistically efficient learning mechanism from EFDT and the highly effective drift detection and response mechanism of HAT. To do so requires decoupling of the EFDT splitting and revision mechanisms, as the latter incorrectly triggers the HAT drift detection mechanism. The resulting learner, Extremely Fast Hoeffding Adaptive Tree, responds to drift more rapidly and effectively than either HAT or EFDT, and attains a statistically significant advantage in accuracy even on stationary streams.
Original language | English |
---|---|
Title of host publication | Proceedings - 22nd IEEE International Conference on Data Mining, ICDM 2022 |
Editors | Xingquan Zhu, Sanjay Ranka, My T. Thai, Takashi Washio, Xindong Wu |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 319-328 |
Number of pages | 10 |
ISBN (Electronic) | 9781665450997 |
ISBN (Print) | 9781665451000 |
DOIs | |
Publication status | Published - 2022 |
Event | IEEE International Conference on Data Mining 2022 - Orlando, United States of America Duration: 28 Nov 2022 → 1 Dec 2022 Conference number: 22nd https://ieeexplore.ieee.org/xpl/conhome/10027565/proceeding (Proceedings) https://icdm22.cse.usf.edu/ (Website) |
Publication series
Name | Proceedings - IEEE International Conference on Data Mining, ICDM |
---|---|
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Volume | 2022-November |
ISSN (Print) | 1550-4786 |
ISSN (Electronic) | 2374-8486 |
Conference
Conference | IEEE International Conference on Data Mining 2022 |
---|---|
Abbreviated title | ICDM 2022 |
Country/Territory | United States of America |
City | Orlando |
Period | 28/11/22 → 1/12/22 |
Internet address |
|
Keywords
- concept drift
- data mining
- decision trees
- online learning