Characterizing concept drift

Geoff I. Webb, Roy Hyde, Hong Cao, Hai-Long Nguyen, Francois Petitjean

    Research output: Contribution to journalArticleResearchpeer-review

    68 Citations (Scopus)

    Abstract

    Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

    Original languageEnglish
    Pages (from-to)964-994
    Number of pages31
    JournalData Mining and Knowledge Discovery
    Volume30
    Issue number4
    DOIs
    Publication statusPublished - 1 Jul 2016

    Keywords

    • Concept drift
    • Learning from non-stationary distributions
    • Stream learning
    • Stream mining

    Cite this

    Webb, Geoff I. ; Hyde, Roy ; Cao, Hong ; Nguyen, Hai-Long ; Petitjean, Francois. / Characterizing concept drift. In: Data Mining and Knowledge Discovery. 2016 ; Vol. 30, No. 4. pp. 964-994.
    @article{245a708f74d34896ae5da8d86e8f3df1,
    title = "Characterizing concept drift",
    abstract = "Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.",
    keywords = "Concept drift, Learning from non-stationary distributions, Stream learning, Stream mining",
    author = "Webb, {Geoff I.} and Roy Hyde and Hong Cao and Hai-Long Nguyen and Francois Petitjean",
    year = "2016",
    month = "7",
    day = "1",
    doi = "10.1007/s10618-015-0448-4",
    language = "English",
    volume = "30",
    pages = "964--994",
    journal = "Data Mining and Knowledge Discovery",
    issn = "1384-5810",
    publisher = "Springer",
    number = "4",

    }

    Characterizing concept drift. / Webb, Geoff I.; Hyde, Roy; Cao, Hong; Nguyen, Hai-Long; Petitjean, Francois.

    In: Data Mining and Knowledge Discovery, Vol. 30, No. 4, 01.07.2016, p. 964-994.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Characterizing concept drift

    AU - Webb, Geoff I.

    AU - Hyde, Roy

    AU - Cao, Hong

    AU - Nguyen, Hai-Long

    AU - Petitjean, Francois

    PY - 2016/7/1

    Y1 - 2016/7/1

    N2 - Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

    AB - Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

    KW - Concept drift

    KW - Learning from non-stationary distributions

    KW - Stream learning

    KW - Stream mining

    UR - http://www.scopus.com/inward/record.url?scp=84963757730&partnerID=8YFLogxK

    U2 - 10.1007/s10618-015-0448-4

    DO - 10.1007/s10618-015-0448-4

    M3 - Article

    AN - SCOPUS:84963757730

    VL - 30

    SP - 964

    EP - 994

    JO - Data Mining and Knowledge Discovery

    JF - Data Mining and Knowledge Discovery

    SN - 1384-5810

    IS - 4

    ER -