Survey of distance measures for quantifying concept drift and shift in numeric data

Igor Goldenberg, Geoffrey I. Webb

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Deployed machine learning systems are necessarily learned from historical data and are often applied to current data. When the world changes, the learned models can lose fidelity. Such changes to the statistical properties of data over time are known as concept drift. Similarly, models are often learned in one context, but need to be applied in another. This is called concept shift. Quantifying the magnitude of drift or shift, especially in the context of covariate drift or shift, or unsupervised learning, requires use of measures of distance between distributions. In this paper, we survey such distance measures with respect to their suitability for estimating drift and shift magnitude between samples of numeric data.

Original languageEnglish
Pages (from-to)591-615
Number of pages25
JournalKnowledge and Information Systems
Volume60
Issue number2
DOIs
Publication statusPublished - Sep 2018

Keywords

  • Hellinger distance
  • Hotelling distance
  • Kullback–Leibler divergence
  • Mahalanobis distance
  • Multivariate concept drift

Cite this

@article{1e3e1f25f4a94f56ba7b3a4abb250755,
title = "Survey of distance measures for quantifying concept drift and shift in numeric data",
abstract = "Deployed machine learning systems are necessarily learned from historical data and are often applied to current data. When the world changes, the learned models can lose fidelity. Such changes to the statistical properties of data over time are known as concept drift. Similarly, models are often learned in one context, but need to be applied in another. This is called concept shift. Quantifying the magnitude of drift or shift, especially in the context of covariate drift or shift, or unsupervised learning, requires use of measures of distance between distributions. In this paper, we survey such distance measures with respect to their suitability for estimating drift and shift magnitude between samples of numeric data.",
keywords = "Hellinger distance, Hotelling distance, Kullback–Leibler divergence, Mahalanobis distance, Multivariate concept drift",
author = "Igor Goldenberg and Webb, {Geoffrey I.}",
year = "2018",
month = "9",
doi = "10.1007/s10115-018-1257-z",
language = "English",
volume = "60",
pages = "591--615",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer-Verlag London Ltd.",
number = "2",

}

Survey of distance measures for quantifying concept drift and shift in numeric data. / Goldenberg, Igor; Webb, Geoffrey I.

In: Knowledge and Information Systems, Vol. 60, No. 2, 09.2018, p. 591-615.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Survey of distance measures for quantifying concept drift and shift in numeric data

AU - Goldenberg, Igor

AU - Webb, Geoffrey I.

PY - 2018/9

Y1 - 2018/9

N2 - Deployed machine learning systems are necessarily learned from historical data and are often applied to current data. When the world changes, the learned models can lose fidelity. Such changes to the statistical properties of data over time are known as concept drift. Similarly, models are often learned in one context, but need to be applied in another. This is called concept shift. Quantifying the magnitude of drift or shift, especially in the context of covariate drift or shift, or unsupervised learning, requires use of measures of distance between distributions. In this paper, we survey such distance measures with respect to their suitability for estimating drift and shift magnitude between samples of numeric data.

AB - Deployed machine learning systems are necessarily learned from historical data and are often applied to current data. When the world changes, the learned models can lose fidelity. Such changes to the statistical properties of data over time are known as concept drift. Similarly, models are often learned in one context, but need to be applied in another. This is called concept shift. Quantifying the magnitude of drift or shift, especially in the context of covariate drift or shift, or unsupervised learning, requires use of measures of distance between distributions. In this paper, we survey such distance measures with respect to their suitability for estimating drift and shift magnitude between samples of numeric data.

KW - Hellinger distance

KW - Hotelling distance

KW - Kullback–Leibler divergence

KW - Mahalanobis distance

KW - Multivariate concept drift

UR - http://www.scopus.com/inward/record.url?scp=85053592512&partnerID=8YFLogxK

U2 - 10.1007/s10115-018-1257-z

DO - 10.1007/s10115-018-1257-z

M3 - Article

VL - 60

SP - 591

EP - 615

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 2

ER -