Scalable learning of Bayesian network classifiers

Ana M. Martínez, Geoffrey I. Webb, Shenglei Chen, Nayyar A. Zaidi

    Research output: Contribution to journalArticleResearchpeer-review

    16 Citations (Scopus)

    Abstract

    Ever increasing data quantity makes ever more urgent the need for highly scalable learners that have good classification performance. Therefore, an out-of-core learner with excellent time and space complexity, along with high expressivity (that is, capacity to learn very complex multivariate probability distributions) is extremely desirable. This paper presents such a learner. We propose an extension to the k-dependence Bayesian classifier (KDB) that discriminatively selects a sub-model of a full KDB classifier. It requires only one additional pass through the training data, making it a three-pass learner. Our extensive experimental evaluation on 16 large data sets reveals that this out-of-core algorithm achieves competitive classification performance, and substantially better training and classification time than state-of-the-art in-core learners such as random forest and linear and non-linear logistic regression.

    Original languageEnglish
    Number of pages35
    JournalJournal of Machine Learning Research
    Volume17
    Issue number44
    Publication statusPublished - 2016

    Keywords

    • Scalable Bayesian classification
    • Feature selection
    • Out-of-core learning
    • Big data

    Cite this

    @article{21f1ff933af54606918eccdf56b6aafb,
    title = "Scalable learning of Bayesian network classifiers",
    abstract = "Ever increasing data quantity makes ever more urgent the need for highly scalable learners that have good classification performance. Therefore, an out-of-core learner with excellent time and space complexity, along with high expressivity (that is, capacity to learn very complex multivariate probability distributions) is extremely desirable. This paper presents such a learner. We propose an extension to the k-dependence Bayesian classifier (KDB) that discriminatively selects a sub-model of a full KDB classifier. It requires only one additional pass through the training data, making it a three-pass learner. Our extensive experimental evaluation on 16 large data sets reveals that this out-of-core algorithm achieves competitive classification performance, and substantially better training and classification time than state-of-the-art in-core learners such as random forest and linear and non-linear logistic regression.",
    keywords = "Scalable Bayesian classification, Feature selection, Out-of-core learning, Big data",
    author = "Mart{\'i}nez, {Ana M.} and Webb, {Geoffrey I.} and Shenglei Chen and Zaidi, {Nayyar A.}",
    year = "2016",
    language = "English",
    volume = "17",
    journal = "Journal of Machine Learning Research",
    issn = "1532-4435",
    publisher = "Journal of Machine Learning Research (JMLR)",
    number = "44",

    }

    Scalable learning of Bayesian network classifiers. / Martínez, Ana M.; Webb, Geoffrey I.; Chen, Shenglei; Zaidi, Nayyar A.

    In: Journal of Machine Learning Research, Vol. 17, No. 44, 2016.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Scalable learning of Bayesian network classifiers

    AU - Martínez, Ana M.

    AU - Webb, Geoffrey I.

    AU - Chen, Shenglei

    AU - Zaidi, Nayyar A.

    PY - 2016

    Y1 - 2016

    N2 - Ever increasing data quantity makes ever more urgent the need for highly scalable learners that have good classification performance. Therefore, an out-of-core learner with excellent time and space complexity, along with high expressivity (that is, capacity to learn very complex multivariate probability distributions) is extremely desirable. This paper presents such a learner. We propose an extension to the k-dependence Bayesian classifier (KDB) that discriminatively selects a sub-model of a full KDB classifier. It requires only one additional pass through the training data, making it a three-pass learner. Our extensive experimental evaluation on 16 large data sets reveals that this out-of-core algorithm achieves competitive classification performance, and substantially better training and classification time than state-of-the-art in-core learners such as random forest and linear and non-linear logistic regression.

    AB - Ever increasing data quantity makes ever more urgent the need for highly scalable learners that have good classification performance. Therefore, an out-of-core learner with excellent time and space complexity, along with high expressivity (that is, capacity to learn very complex multivariate probability distributions) is extremely desirable. This paper presents such a learner. We propose an extension to the k-dependence Bayesian classifier (KDB) that discriminatively selects a sub-model of a full KDB classifier. It requires only one additional pass through the training data, making it a three-pass learner. Our extensive experimental evaluation on 16 large data sets reveals that this out-of-core algorithm achieves competitive classification performance, and substantially better training and classification time than state-of-the-art in-core learners such as random forest and linear and non-linear logistic regression.

    KW - Scalable Bayesian classification

    KW - Feature selection

    KW - Out-of-core learning

    KW - Big data

    UR - http://www.scopus.com/inward/record.url?scp=84979917083&partnerID=8YFLogxK

    M3 - Article

    AN - SCOPUS:84979917083

    VL - 17

    JO - Journal of Machine Learning Research

    JF - Journal of Machine Learning Research

    SN - 1532-4435

    IS - 44

    ER -