PCA-based drift and shift quantification framework for multidimensional data

Igor Goldenberg, Geoffrey I. Webb

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Concept drift is a serious problem confronting machine learning systems in a dynamic and ever-changing world. In order to manage concept drift it may be useful to first quantify it by measuring the distance between distributions that generate data before and after a drift. There is a paucity of methods to do so in the case of multidimensional numeric data. This paper provides an in-depth analysis of the PCA-based change detection approach, identifies shortcomings of existing methods and shows how this approach can be used to measure a drift, not merely detect it.

Original languageEnglish
Pages (from-to)2835-2854
Number of pages20
JournalKnowledge and Information Systems
Volume62
DOIs
Publication statusPublished - 6 Feb 2020

Keywords

  • Drift detection
  • Hellinger distance
  • Principal component analysis

Cite this