Skopus: Mining top-k sequential patterns under leverage

François Petitjean, Tao Li, Nikolaj Tatti, Geoffrey I. Webb

    Research output: Contribution to journalArticleResearchpeer-review

    15 Citations (Scopus)

    Abstract

    This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern—a concept on which most interestingness measures directly rely—with (2) Skopus: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art.
    Original languageEnglish
    Pages (from-to)1086-1111
    Number of pages26
    JournalData Mining and Knowledge Discovery
    Volume30
    Issue number5
    DOIs
    Publication statusPublished - 1 Sep 2016

    Keywords

    • Data mining
    • Pattern mining
    • Sequential data
    • Exact discovery
    • Interestingness measures

    Cite this

    @article{9f15c09c167d4720b591dfd91ca6fb49,
    title = "Skopus: Mining top-k sequential patterns under leverage",
    abstract = "This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern—a concept on which most interestingness measures directly rely—with (2) Skopus: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art.",
    keywords = "Data mining, Pattern mining, Sequential data, Exact discovery, Interestingness measures",
    author = "Fran{\cc}ois Petitjean and Tao Li and Nikolaj Tatti and Webb, {Geoffrey I.}",
    year = "2016",
    month = "9",
    day = "1",
    doi = "10.1007/s10618-016-0467-9",
    language = "English",
    volume = "30",
    pages = "1086--1111",
    journal = "Data Mining and Knowledge Discovery",
    issn = "1384-5810",
    publisher = "Springer",
    number = "5",

    }

    Skopus : Mining top-k sequential patterns under leverage. / Petitjean, François; Li, Tao; Tatti, Nikolaj; Webb, Geoffrey I.

    In: Data Mining and Knowledge Discovery, Vol. 30, No. 5, 01.09.2016, p. 1086-1111.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Skopus

    T2 - Mining top-k sequential patterns under leverage

    AU - Petitjean, François

    AU - Li, Tao

    AU - Tatti, Nikolaj

    AU - Webb, Geoffrey I.

    PY - 2016/9/1

    Y1 - 2016/9/1

    N2 - This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern—a concept on which most interestingness measures directly rely—with (2) Skopus: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art.

    AB - This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern—a concept on which most interestingness measures directly rely—with (2) Skopus: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art.

    KW - Data mining

    KW - Pattern mining

    KW - Sequential data

    KW - Exact discovery

    KW - Interestingness measures

    UR - http://www.scopus.com/inward/record.url?scp=84974806516&partnerID=8YFLogxK

    U2 - 10.1007/s10618-016-0467-9

    DO - 10.1007/s10618-016-0467-9

    M3 - Article

    AN - SCOPUS:84974806516

    VL - 30

    SP - 1086

    EP - 1111

    JO - Data Mining and Knowledge Discovery

    JF - Data Mining and Knowledge Discovery

    SN - 1384-5810

    IS - 5

    ER -