Mining significant association rules from uncertain data

Anshu Zhang, Wenzhong Shi, Geoffrey I. Webb

    Research output: Contribution to journalArticleResearchpeer-review

    7 Citations (Scopus)

    Abstract

    In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.

    Original languageEnglish
    Pages (from-to)928-963
    Number of pages36
    JournalData Mining and Knowledge Discovery
    Volume30
    Issue number4
    DOIs
    Publication statusPublished - 1 Jul 2016

    Keywords

    • Pattern discovery
    • Association rules
    • Statistical evaluation
    • Uncertain data

    Cite this

    Zhang, Anshu ; Shi, Wenzhong ; Webb, Geoffrey I. / Mining significant association rules from uncertain data. In: Data Mining and Knowledge Discovery. 2016 ; Vol. 30, No. 4. pp. 928-963.
    @article{dea11cf9cdf14483a8ba3c43881157e7,
    title = "Mining significant association rules from uncertain data",
    abstract = "In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.",
    keywords = "Pattern discovery, Association rules, Statistical evaluation, Uncertain data",
    author = "Anshu Zhang and Wenzhong Shi and Webb, {Geoffrey I.}",
    year = "2016",
    month = "7",
    day = "1",
    doi = "10.1007/s10618-015-0446-6",
    language = "English",
    volume = "30",
    pages = "928--963",
    journal = "Data Mining and Knowledge Discovery",
    issn = "1384-5810",
    publisher = "Springer",
    number = "4",

    }

    Mining significant association rules from uncertain data. / Zhang, Anshu; Shi, Wenzhong; Webb, Geoffrey I.

    In: Data Mining and Knowledge Discovery, Vol. 30, No. 4, 01.07.2016, p. 928-963.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Mining significant association rules from uncertain data

    AU - Zhang, Anshu

    AU - Shi, Wenzhong

    AU - Webb, Geoffrey I.

    PY - 2016/7/1

    Y1 - 2016/7/1

    N2 - In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.

    AB - In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.

    KW - Pattern discovery

    KW - Association rules

    KW - Statistical evaluation

    KW - Uncertain data

    UR - http://www.scopus.com/inward/record.url?scp=84954314823&partnerID=8YFLogxK

    U2 - 10.1007/s10618-015-0446-6

    DO - 10.1007/s10618-015-0446-6

    M3 - Article

    AN - SCOPUS:84954314823

    VL - 30

    SP - 928

    EP - 963

    JO - Data Mining and Knowledge Discovery

    JF - Data Mining and Knowledge Discovery

    SN - 1384-5810

    IS - 4

    ER -