Analyzing concept drift and shift from sample data

Geoffrey I. Webb, Loong Kuan Lee, Bart Goethals, François Petitjean

    Research output: Contribution to journalArticleResearchpeer-review

    Abstract

    Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping—the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.

    Original languageEnglish
    Pages (from-to)1179-1199
    Number of pages21
    JournalData Mining and Knowledge Discovery
    Volume32
    Issue number5
    DOIs
    Publication statusPublished - Sep 2018

    Keywords

    • Concept drift
    • Concept shift
    • Mapping
    • Non-stationary distribution
    • Visualisation

    Cite this

    Webb, Geoffrey I. ; Lee, Loong Kuan ; Goethals, Bart ; Petitjean, François. / Analyzing concept drift and shift from sample data. In: Data Mining and Knowledge Discovery. 2018 ; Vol. 32, No. 5. pp. 1179-1199.
    @article{71cf3128b04f40d5bcdee2e60eb3eefa,
    title = "Analyzing concept drift and shift from sample data",
    abstract = "Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping—the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.",
    keywords = "Concept drift, Concept shift, Mapping, Non-stationary distribution, Visualisation",
    author = "Webb, {Geoffrey I.} and Lee, {Loong Kuan} and Bart Goethals and Fran{\cc}ois Petitjean",
    year = "2018",
    month = "9",
    doi = "10.1007/s10618-018-0554-1",
    language = "English",
    volume = "32",
    pages = "1179--1199",
    journal = "Data Mining and Knowledge Discovery",
    issn = "1384-5810",
    publisher = "Springer",
    number = "5",

    }

    Analyzing concept drift and shift from sample data. / Webb, Geoffrey I.; Lee, Loong Kuan; Goethals, Bart; Petitjean, François.

    In: Data Mining and Knowledge Discovery, Vol. 32, No. 5, 09.2018, p. 1179-1199.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Analyzing concept drift and shift from sample data

    AU - Webb, Geoffrey I.

    AU - Lee, Loong Kuan

    AU - Goethals, Bart

    AU - Petitjean, François

    PY - 2018/9

    Y1 - 2018/9

    N2 - Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping—the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.

    AB - Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping—the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.

    KW - Concept drift

    KW - Concept shift

    KW - Mapping

    KW - Non-stationary distribution

    KW - Visualisation

    UR - http://www.scopus.com/inward/record.url?scp=85043484517&partnerID=8YFLogxK

    U2 - 10.1007/s10618-018-0554-1

    DO - 10.1007/s10618-018-0554-1

    M3 - Article

    VL - 32

    SP - 1179

    EP - 1199

    JO - Data Mining and Knowledge Discovery

    JF - Data Mining and Knowledge Discovery

    SN - 1384-5810

    IS - 5

    ER -