Selective AnDE for large data learning: A low-bias memory constrained approach

Shenglei Chen, Ana M. Martínez, Geoffrey I. Webb, Limin Wang

    Research output: Contribution to journalArticleResearchpeer-review

    9 Citations (Scopus)

    Abstract

    Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to An′DE, where n′ = - 1, while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.

    Original languageEnglish
    Pages (from-to)475-503
    Number of pages29
    JournalKnowledge and Information Systems
    Volume50
    Issue number2
    DOIs
    Publication statusPublished - Feb 2017

    Keywords

    • Attribute selection
    • Bayesian classification
    • AnDE
    • Large data

    Cite this