On normalization and algorithm selection for unsupervised outlier detection

Sevvandi Kandanaarachchi, Mario A. Muñoz, Rob J. Hyndman, Kate Smith-Miles

Research output: Contribution to journalArticleResearchpeer-review

5 Citations (Scopus)

Abstract

This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

Original languageEnglish
Pages (from-to)309-354
Number of pages46
JournalData Mining and Knowledge Discovery
Volume34
DOIs
Publication statusPublished - 2020

Keywords

  • Algorithm selection problem for outlier detection
  • Effect of normalization on outlier detection
  • Instance space analysis
  • Instance space analysis for outlier detection
  • Unsupervised outlier detection

Cite this