Hierarchical Gradient Smoothing for probability estimation trees

He Zhang, François Petitjean, Wray Buntine

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Decision trees are still seeing use in online, non-stationary and embedded contexts, as well as for interpretability. For applications like ranking and cost-sensitive classification, probability estimation trees (PETs) are used. These are built using smoothing or calibration techniques. Older smoothing techniques used counts local to a leaf node, but a few more recent techniques consider the broader context of a node when doing estimation. We apply a recent advanced smoothing method called Hierarchical Dirichlet Process (HDP) to PETs, and then propose a novel hierarchical smoothing approach called Hierarchical Gradient Smoothing (HGS) as an alternative. HGS smooths leaf nodes up to all the ancestors, instead of recursively smoothing to the parent used by HDP. HGS is made faster by efficiently optimizing the Leave-One-Out Cross-Validation (LOOCV) loss measure using gradient descent, instead of sampling used in HDP. An extensive set of experiments are conducted on 143 datasets showing that our HGS estimates are not only more accurate but also do so within a fraction of HDP time. Besides, HGS makes a single tree almost as good as a Random Forest with 10 trees. For applications that require more interpretability and efficiency, a single decision tree plus HGS is more preferred.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication24th Pacific-Asia Conference, PAKDD 2020 Singapore, May 11–14, 2020 Proceedings, Part I
EditorsHady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas
Place of PublicationCham Switzerland
PublisherSpringer
Pages222-234
Number of pages13
ISBN (Electronic)9783030474263
ISBN (Print)9783030474256
DOIs
Publication statusPublished - 2020
EventPacific-Asia Conference on Knowledge Discovery and Data Mining 2020 - Singapore, Singapore
Duration: 11 May 202014 May 2020
Conference number: 24th
https://pakdd2020.org (Website)
https://link.springer.com/book/10.1007/978-3-030-47426-3 (Conference Papers)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12084
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 2020
Abbreviated titlePAKDD 2020
CountrySingapore
CitySingapore
Period11/05/2014/05/20
Internet address

Keywords

  • Class probability estimation
  • Hierarchical Dirichlet Process
  • Hierarchical probability smoothing
  • Probability estimation trees

Cite this