Sample-based attribute selective An DE for large data

Shenglei Chen, Ana M. Martinez, Geoffrey I. Webb, Limin Wang

    Research output: Contribution to journalArticleResearchpeer-review

    6 Citations (Scopus)

    Abstract

    More and more applications have come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers.

    Original languageEnglish
    Article number7565579
    Pages (from-to)172-185
    Number of pages14
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume29
    Issue number1
    DOIs
    Publication statusPublished - 1 Jan 2017

    Keywords

    • Bayesian network classifiers
    • Large data
    • Classification learning
    • Attribute selection
    • Averaged n-dependence estimators (AnDE)
    • Leave-one-out cross validation

    Cite this