On normalization and algorithm selection for unsupervised outlier detection

Sevvandi Kandanaarachchi, Mario A. Muñoz, Rob J. Hyndman, Kate Smith-Miles

Research output: Contribution to journalArticleResearchpeer-review

Abstract

This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

Original languageEnglish
Number of pages46
JournalData Mining and Knowledge Discovery
DOIs
Publication statusAccepted/In press - 2019

Keywords

  • Algorithm selection problem for outlier detection
  • Effect of normalization on outlier detection
  • Instance space analysis
  • Instance space analysis for outlier detection
  • Unsupervised outlier detection

Cite this

@article{60e961b8f8834e0cbf3a55d5ea7e30f7,
title = "On normalization and algorithm selection for unsupervised outlier detection",
abstract = "This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.",
keywords = "Algorithm selection problem for outlier detection, Effect of normalization on outlier detection, Instance space analysis, Instance space analysis for outlier detection, Unsupervised outlier detection",
author = "Sevvandi Kandanaarachchi and Mu{\~n}oz, {Mario A.} and Hyndman, {Rob J.} and Kate Smith-Miles",
year = "2019",
doi = "10.1007/s10618-019-00661-z",
language = "English",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer",

}

On normalization and algorithm selection for unsupervised outlier detection. / Kandanaarachchi, Sevvandi; Muñoz, Mario A.; Hyndman, Rob J.; Smith-Miles, Kate.

In: Data Mining and Knowledge Discovery, 2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - On normalization and algorithm selection for unsupervised outlier detection

AU - Kandanaarachchi, Sevvandi

AU - Muñoz, Mario A.

AU - Hyndman, Rob J.

AU - Smith-Miles, Kate

PY - 2019

Y1 - 2019

N2 - This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

AB - This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

KW - Algorithm selection problem for outlier detection

KW - Effect of normalization on outlier detection

KW - Instance space analysis

KW - Instance space analysis for outlier detection

KW - Unsupervised outlier detection

UR - http://www.scopus.com/inward/record.url?scp=85075334402&partnerID=8YFLogxK

U2 - 10.1007/s10618-019-00661-z

DO - 10.1007/s10618-019-00661-z

M3 - Article

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

ER -