Online clustering for evolving data streams with online anomaly detection

Milad Chenaghlou, Masud Moshtaghi, Christopher Leckie, Mahsa Salehi

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

28 Citations (Scopus)

Abstract

Clustering data streams is an emerging challenge with a wide range of applications in areas including Wireless Sensor Networks, the Internet of Things, finance and social media. In an evolving data stream, a clustering algorithm is desired to both (a) assign observations to clusters and (b) identify anomalies in real-time. Current state-of-the-art algorithms in the literature do not address feature (b) as they only consider the spatial proximity of data, which results in (1) poor clustering and (2) poor demonstration of the temporal evolution of data in noisy environments. In this paper, we propose an online clustering algorithm that considers the temporal proximity of observations as well as their spatial proximity to identify anomalies in real-time. It identifies the evolution of clusters in noisy streams, incrementally updates the model and calculates the minimum window length over the evolving data stream without jeopardizing performance. To the best of our knowledge, this is the first online clustering algorithm that identifies anomalies in real-time and discovers the temporal evolution of clusters. Our contributions are supported by synthetic as well as real-world data experiments.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018 - Proceedings, Part II
EditorsDinh Phung, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi
Place of PublicationCham Switzerland
PublisherSpringer
Pages508-521
Number of pages14
ISBN (Electronic)9783319930374
ISBN (Print)9783319930367
DOIs
Publication statusPublished - 2018
EventPacific-Asia Conference on Knowledge Discovery and Data Mining 2018 - Grand Hyatt, Melbourne, Australia
Duration: 3 Jun 20186 Jun 2018
Conference number: 22nd
http://pakdd2018.medmeeting.org/Content/92892
https://link.springer.com/book/10.1007/978-3-319-93034-3 (Proceedings)

Publication series

NameLecture Notes in Artificial Intelligence
PublisherSpringer
Volume10938
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 2018
Abbreviated titlePAKDD 2018
Country/TerritoryAustralia
CityMelbourne
Period3/06/186/06/18
Internet address

Cite this