Outlier detection on mixed-type data: an energy-based approach

Kien Do, Truyen Tran, Dinh Phung, Svetha Venkatesh

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

5 Citations (Scopus)

Abstract

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use free-energy derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications
Subtitle of host publication12th International Conference, ADMA 2016 Gold Coast, QLD, Australia, December 12–15, 2016 Proceedings
EditorsJianxin Li, Xue Li, Shuliang Wang, Jinyan Li, Quan Z. Sheng
Place of PublicationCham Switzerland
PublisherSpringer
Pages111-125
Number of pages15
ISBN (Electronic)9783319495866
ISBN (Print)9783319495859
DOIs
Publication statusPublished - 2016
Externally publishedYes
EventInternational Conference on Advanced Data Mining and Applications 2016 - Gold Coast, Australia
Duration: 12 Dec 201615 Dec 2016
Conference number: 12th
https://cs.adelaide.edu.au/~adma2016/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume10086
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Advanced Data Mining and Applications 2016
Abbreviated titleADMA 2016
CountryAustralia
CityGold Coast
Period12/12/1615/12/16
Internet address

Cite this