Mining significant crisp-fuzzy spatial association rules

Wenzhong Shi, Anshu Zhang, Geoffrey I. Webb

    Research output: Contribution to journalArticleResearchpeer-review

    2 Citations (Scopus)

    Abstract

    Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50% or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1%. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50% for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.

    Original languageEnglish
    Pages (from-to)1247-1270
    Number of pages24
    JournalInternational Journal of Geographical Information Science
    Volume32
    Issue number6
    DOIs
    Publication statusPublished - 2018

    Keywords

    • fuzzy sets and logic
    • quality issues
    • Spatial association rules
    • spatial data mining
    • statistical evaluation

    Cite this

    @article{6acaa9bbf5cd43d996c6e694662f2994,
    title = "Mining significant crisp-fuzzy spatial association rules",
    abstract = "Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50{\%} or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1{\%}. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50{\%} for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.",
    keywords = "fuzzy sets and logic, quality issues, Spatial association rules, spatial data mining, statistical evaluation",
    author = "Wenzhong Shi and Anshu Zhang and Webb, {Geoffrey I.}",
    year = "2018",
    doi = "10.1080/13658816.2018.1434525",
    language = "English",
    volume = "32",
    pages = "1247--1270",
    journal = "International Journal of Geographical Information Science",
    issn = "1365-8816",
    publisher = "Taylor & Francis",
    number = "6",

    }

    Mining significant crisp-fuzzy spatial association rules. / Shi, Wenzhong; Zhang, Anshu; Webb, Geoffrey I.

    In: International Journal of Geographical Information Science, Vol. 32, No. 6, 2018, p. 1247-1270.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Mining significant crisp-fuzzy spatial association rules

    AU - Shi, Wenzhong

    AU - Zhang, Anshu

    AU - Webb, Geoffrey I.

    PY - 2018

    Y1 - 2018

    N2 - Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50% or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1%. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50% for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.

    AB - Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50% or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1%. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50% for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.

    KW - fuzzy sets and logic

    KW - quality issues

    KW - Spatial association rules

    KW - spatial data mining

    KW - statistical evaluation

    UR - http://www.scopus.com/inward/record.url?scp=85041804621&partnerID=8YFLogxK

    U2 - 10.1080/13658816.2018.1434525

    DO - 10.1080/13658816.2018.1434525

    M3 - Article

    AN - SCOPUS:85041804621

    VL - 32

    SP - 1247

    EP - 1270

    JO - International Journal of Geographical Information Science

    JF - International Journal of Geographical Information Science

    SN - 1365-8816

    IS - 6

    ER -