Selective AnDE for large data learning

A low-bias memory constrained approach

Shenglei Chen, Ana M. Martínez, Geoffrey I. Webb, Limin Wang

    Research output: Contribution to journalArticleResearchpeer-review

    Abstract

    Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to An′DE, where n′ = - 1, while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.

    Original languageEnglish
    Pages (from-to)475-503
    Number of pages29
    JournalKnowledge and Information Systems
    Volume50
    Issue number2
    DOIs
    Publication statusPublished - Feb 2017

    Keywords

    • Attribute selection
    • Bayesian classification
    • AnDE
    • Large data

    Cite this

    Chen, Shenglei ; Martínez, Ana M. ; Webb, Geoffrey I. ; Wang, Limin. / Selective AnDE for large data learning : A low-bias memory constrained approach. In: Knowledge and Information Systems. 2017 ; Vol. 50, No. 2. pp. 475-503.
    @article{bf8b9ff7c58b4820995393dfc5d97e13,
    title = "Selective AnDE for large data learning: A low-bias memory constrained approach",
    abstract = "Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to An′DE, where n′ = n - 1, while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.",
    keywords = "Attribute selection, Bayesian classification, AnDE, Large data",
    author = "Shenglei Chen and Mart{\'i}nez, {Ana M.} and Webb, {Geoffrey I.} and Limin Wang",
    year = "2017",
    month = "2",
    doi = "10.1007/s10115-016-0937-9",
    language = "English",
    volume = "50",
    pages = "475--503",
    journal = "Knowledge and Information Systems",
    issn = "0219-1377",
    publisher = "Springer-Verlag London Ltd.",
    number = "2",

    }

    Selective AnDE for large data learning : A low-bias memory constrained approach. / Chen, Shenglei; Martínez, Ana M.; Webb, Geoffrey I.; Wang, Limin.

    In: Knowledge and Information Systems, Vol. 50, No. 2, 02.2017, p. 475-503.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Selective AnDE for large data learning

    T2 - A low-bias memory constrained approach

    AU - Chen, Shenglei

    AU - Martínez, Ana M.

    AU - Webb, Geoffrey I.

    AU - Wang, Limin

    PY - 2017/2

    Y1 - 2017/2

    N2 - Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to An′DE, where n′ = n - 1, while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.

    AB - Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to An′DE, where n′ = n - 1, while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.

    KW - Attribute selection

    KW - Bayesian classification

    KW - AnDE

    KW - Large data

    UR - http://www.scopus.com/inward/record.url?scp=84962013983&partnerID=8YFLogxK

    U2 - 10.1007/s10115-016-0937-9

    DO - 10.1007/s10115-016-0937-9

    M3 - Article

    VL - 50

    SP - 475

    EP - 503

    JO - Knowledge and Information Systems

    JF - Knowledge and Information Systems

    SN - 0219-1377

    IS - 2

    ER -