Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities

Nayyar A. Zaidi, Francois Petitjean, Geoffrey I. Webb

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

Machine learning researchers are facing a data deluge-quantities of training data have been increasing at a rapid rate. However, most of machine learning algorithms were proposed in the context of learning from relatively smaller quantities of data. We argue that a big data classifier should have superior feature engineering capability, minimal tuning parameters and should be able to learn decision boundaries in fewer passes through the data. In this paper, we have proposed an (computationally) efficient yet (classification-wise) effective family of learning algorithms that fulfils these properties. The proposed family of learning algorithms is based on recently proposed accelerated higher-order logistic regression algorithm: ALRn. The contributions of this work are three-fold. First, we have added the functionality of out-of-core learning in ALRn, resulting in a limited pass learning algorithm. Second, superior feature engineering capabilities are built and third, a far more efficient (memory-wise) implementation has been proposed. We demonstrate the competitiveness of our proposed algorithm by comparing its performance not only with state-of-the-art classifier in out-of-core learning such as Selective KDB but also with state-of-the-art in in-core learning such as Random Forest.

Original languageEnglish
Title of host publication2018 SIAM International Conference on Data Mining, SDM 2018
Subtitle of host publicationSan Diego Marriott Mission Valley San Diego, California USA May 3-5, 2018
EditorsMartin Ester, Dino Pedreschi
Place of PublicationPhiladelphia PA USA
PublisherSociety for Industrial & Applied Mathematics (SIAM)
Pages459-467
Number of pages9
ISBN (Electronic)9781611975321
DOIs
Publication statusPublished - 2018
EventSIAM International Conference on Data Mining 2018 - San Diego Marriott Mission Valley, San Diego, United States of America
Duration: 3 May 20185 May 2018
https://epubs.siam.org/doi/10.1137/1.9781611975321.fm

Conference

ConferenceSIAM International Conference on Data Mining 2018
Abbreviated titleSDM 18
Country/TerritoryUnited States of America
CitySan Diego
Period3/05/185/05/18
Internet address

Keywords

  • Adaptive step-size
  • Feature engineering
  • Higher-order logistic regression
  • SGD
  • Tuple/feature selection

Cite this