Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Machine learning researchers are facing a data deluge-quantities of training data have been increasing at a rapid rate. However, most of machine learning algorithms were proposed in the context of learning from relatively smaller quantities of data. We argue that a big data classifier should have superior feature engineering capability, minimal tuning parameters and should be able to learn decision boundaries in fewer passes through the data. In this paper, we have proposed an (computationally) efficient yet (classification-wise) effective family of learning algorithms that fulfils these properties. The proposed family of learning algorithms is based on recently proposed accelerated higher-order logistic regression algorithm: ALRn. The contributions of this work are three-fold. First, we have added the functionality of out-of-core learning in ALRn, resulting in a limited pass learning algorithm. Second, superior feature engineering capabilities are built and third, a far more efficient (memory-wise) implementation has been proposed. We demonstrate the competitiveness of our proposed algorithm by comparing its performance not only with state-of-the-art classifier in out-of-core learning such as Selective KDB but also with state-of-the-art in in-core learning such as Random Forest.

Original languageEnglish
Title of host publication2018 SIAM International Conference on Data Mining, SDM 2018
Subtitle of host publicationSan Diego Marriott Mission Valley San Diego, California USA May 3-5, 2018
EditorsMartin Ester, Dino Pedreschi
Place of PublicationPhiladelphia PA USA
PublisherSociety for Industrial & Applied Mathematics (SIAM)
Pages459-467
Number of pages9
ISBN (Electronic)9781611975321
DOIs
Publication statusPublished - 2018
EventSIAM International Conference on Data Mining 2018 - San Diego Marriott Mission Valley, San Diego, United States of America
Duration: 3 May 20185 May 2018
https://epubs.siam.org/doi/10.1137/1.9781611975321.fm

Conference

ConferenceSIAM International Conference on Data Mining 2018
Abbreviated titleSDM 18
CountryUnited States of America
CitySan Diego
Period3/05/185/05/18
Internet address

Keywords

  • Adaptive step-size
  • Feature engineering
  • Higher-order logistic regression
  • SGD
  • Tuple/feature selection

Cite this

Zaidi, N. A., Petitjean, F., & Webb, G. I. (2018). Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities. In M. Ester, & D. Pedreschi (Eds.), 2018 SIAM International Conference on Data Mining, SDM 2018: San Diego Marriott Mission Valley San Diego, California USA May 3-5, 2018 (pp. 459-467). Philadelphia PA USA: Society for Industrial & Applied Mathematics (SIAM). https://doi.org/10.1137/1.9781611975321.52
Zaidi, Nayyar A. ; Petitjean, Francois ; Webb, Geoffrey I. / Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities. 2018 SIAM International Conference on Data Mining, SDM 2018: San Diego Marriott Mission Valley San Diego, California USA May 3-5, 2018. editor / Martin Ester ; Dino Pedreschi. Philadelphia PA USA : Society for Industrial & Applied Mathematics (SIAM), 2018. pp. 459-467
@inproceedings{696d908e072845368bc65107485e0c6a,
title = "Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities",
abstract = "Machine learning researchers are facing a data deluge-quantities of training data have been increasing at a rapid rate. However, most of machine learning algorithms were proposed in the context of learning from relatively smaller quantities of data. We argue that a big data classifier should have superior feature engineering capability, minimal tuning parameters and should be able to learn decision boundaries in fewer passes through the data. In this paper, we have proposed an (computationally) efficient yet (classification-wise) effective family of learning algorithms that fulfils these properties. The proposed family of learning algorithms is based on recently proposed accelerated higher-order logistic regression algorithm: ALRn. The contributions of this work are three-fold. First, we have added the functionality of out-of-core learning in ALRn, resulting in a limited pass learning algorithm. Second, superior feature engineering capabilities are built and third, a far more efficient (memory-wise) implementation has been proposed. We demonstrate the competitiveness of our proposed algorithm by comparing its performance not only with state-of-the-art classifier in out-of-core learning such as Selective KDB but also with state-of-the-art in in-core learning such as Random Forest.",
keywords = "Adaptive step-size, Feature engineering, Higher-order logistic regression, SGD, Tuple/feature selection",
author = "Zaidi, {Nayyar A.} and Francois Petitjean and Webb, {Geoffrey I.}",
year = "2018",
doi = "10.1137/1.9781611975321.52",
language = "English",
pages = "459--467",
editor = "Ester, {Martin } and Pedreschi, {Dino }",
booktitle = "2018 SIAM International Conference on Data Mining, SDM 2018",
publisher = "Society for Industrial & Applied Mathematics (SIAM)",

}

Zaidi, NA, Petitjean, F & Webb, GI 2018, Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities. in M Ester & D Pedreschi (eds), 2018 SIAM International Conference on Data Mining, SDM 2018: San Diego Marriott Mission Valley San Diego, California USA May 3-5, 2018. Society for Industrial & Applied Mathematics (SIAM), Philadelphia PA USA, pp. 459-467, SIAM International Conference on Data Mining 2018, San Diego, United States of America, 3/05/18. https://doi.org/10.1137/1.9781611975321.52

Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities. / Zaidi, Nayyar A.; Petitjean, Francois; Webb, Geoffrey I.

2018 SIAM International Conference on Data Mining, SDM 2018: San Diego Marriott Mission Valley San Diego, California USA May 3-5, 2018. ed. / Martin Ester; Dino Pedreschi. Philadelphia PA USA : Society for Industrial & Applied Mathematics (SIAM), 2018. p. 459-467.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities

AU - Zaidi, Nayyar A.

AU - Petitjean, Francois

AU - Webb, Geoffrey I.

PY - 2018

Y1 - 2018

N2 - Machine learning researchers are facing a data deluge-quantities of training data have been increasing at a rapid rate. However, most of machine learning algorithms were proposed in the context of learning from relatively smaller quantities of data. We argue that a big data classifier should have superior feature engineering capability, minimal tuning parameters and should be able to learn decision boundaries in fewer passes through the data. In this paper, we have proposed an (computationally) efficient yet (classification-wise) effective family of learning algorithms that fulfils these properties. The proposed family of learning algorithms is based on recently proposed accelerated higher-order logistic regression algorithm: ALRn. The contributions of this work are three-fold. First, we have added the functionality of out-of-core learning in ALRn, resulting in a limited pass learning algorithm. Second, superior feature engineering capabilities are built and third, a far more efficient (memory-wise) implementation has been proposed. We demonstrate the competitiveness of our proposed algorithm by comparing its performance not only with state-of-the-art classifier in out-of-core learning such as Selective KDB but also with state-of-the-art in in-core learning such as Random Forest.

AB - Machine learning researchers are facing a data deluge-quantities of training data have been increasing at a rapid rate. However, most of machine learning algorithms were proposed in the context of learning from relatively smaller quantities of data. We argue that a big data classifier should have superior feature engineering capability, minimal tuning parameters and should be able to learn decision boundaries in fewer passes through the data. In this paper, we have proposed an (computationally) efficient yet (classification-wise) effective family of learning algorithms that fulfils these properties. The proposed family of learning algorithms is based on recently proposed accelerated higher-order logistic regression algorithm: ALRn. The contributions of this work are three-fold. First, we have added the functionality of out-of-core learning in ALRn, resulting in a limited pass learning algorithm. Second, superior feature engineering capabilities are built and third, a far more efficient (memory-wise) implementation has been proposed. We demonstrate the competitiveness of our proposed algorithm by comparing its performance not only with state-of-the-art classifier in out-of-core learning such as Selective KDB but also with state-of-the-art in in-core learning such as Random Forest.

KW - Adaptive step-size

KW - Feature engineering

KW - Higher-order logistic regression

KW - SGD

KW - Tuple/feature selection

UR - http://www.scopus.com/inward/record.url?scp=85048309036&partnerID=8YFLogxK

U2 - 10.1137/1.9781611975321.52

DO - 10.1137/1.9781611975321.52

M3 - Conference Paper

AN - SCOPUS:85048309036

SP - 459

EP - 467

BT - 2018 SIAM International Conference on Data Mining, SDM 2018

A2 - Ester, Martin

A2 - Pedreschi, Dino

PB - Society for Industrial & Applied Mathematics (SIAM)

CY - Philadelphia PA USA

ER -

Zaidi NA, Petitjean F, Webb GI. Efficient and effective accelerated hierarchical higher-order logistic regression for large data quantities. In Ester M, Pedreschi D, editors, 2018 SIAM International Conference on Data Mining, SDM 2018: San Diego Marriott Mission Valley San Diego, California USA May 3-5, 2018. Philadelphia PA USA: Society for Industrial & Applied Mathematics (SIAM). 2018. p. 459-467 https://doi.org/10.1137/1.9781611975321.52