Scalable learning of Bayesian network classifiers

Ana M. Martínez, Geoffrey I. Webb, Shenglei Chen, Nayyar A. Zaidi

    Research output: Contribution to journalArticleResearchpeer-review

    49 Citations (Scopus)

    Abstract

    Ever increasing data quantity makes ever more urgent the need for highly scalable learners that have good classification performance. Therefore, an out-of-core learner with excellent time and space complexity, along with high expressivity (that is, capacity to learn very complex multivariate probability distributions) is extremely desirable. This paper presents such a learner. We propose an extension to the k-dependence Bayesian classifier (KDB) that discriminatively selects a sub-model of a full KDB classifier. It requires only one additional pass through the training data, making it a three-pass learner. Our extensive experimental evaluation on 16 large data sets reveals that this out-of-core algorithm achieves competitive classification performance, and substantially better training and classification time than state-of-the-art in-core learners such as random forest and linear and non-linear logistic regression.

    Original languageEnglish
    Pages (from-to)1-35
    Number of pages35
    JournalJournal of Machine Learning Research
    Volume17
    Issue number44
    Publication statusPublished - 2016

    Keywords

    • Scalable Bayesian classification
    • Feature selection
    • Out-of-core learning
    • Big data

    Cite this