Online clustering for evolving data streams with online anomaly detection

Milad Chenaghlou, Masud Moshtaghi, Christopher Leckie, Mahsa Salehi

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    Abstract

    Clustering data streams is an emerging challenge with a wide range of applications in areas including Wireless Sensor Networks, the Internet of Things, finance and social media. In an evolving data stream, a clustering algorithm is desired to both (a) assign observations to clusters and (b) identify anomalies in real-time. Current state-of-the-art algorithms in the literature do not address feature (b) as they only consider the spatial proximity of data, which results in (1) poor clustering and (2) poor demonstration of the temporal evolution of data in noisy environments. In this paper, we propose an online clustering algorithm that considers the temporal proximity of observations as well as their spatial proximity to identify anomalies in real-time. It identifies the evolution of clusters in noisy streams, incrementally updates the model and calculates the minimum window length over the evolving data stream without jeopardizing performance. To the best of our knowledge, this is the first online clustering algorithm that identifies anomalies in real-time and discovers the temporal evolution of clusters. Our contributions are supported by synthetic as well as real-world data experiments.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining
    Subtitle of host publication22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018 - Proceedings, Part II
    EditorsDinh Phung, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi
    Place of PublicationCham Switzerland
    PublisherSpringer
    Pages508-521
    Number of pages14
    ISBN (Electronic)9783319930374
    ISBN (Print)9783319930367
    DOIs
    Publication statusPublished - 2018
    EventPacific-Asia Conference on Knowledge Discovery and Data Mining 2018 - Grand Hyatt, Melbourne, Australia
    Duration: 3 Jun 20186 Jun 2018
    Conference number: 22nd
    http://pakdd2018.medmeeting.org/Content/92892

    Publication series

    NameLecture Notes in Artificial Intelligence
    PublisherSpringer
    Volume10938
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 2018
    Abbreviated titlePAKDD 2018
    CountryAustralia
    CityMelbourne
    Period3/06/186/06/18
    Internet address

    Cite this

    Chenaghlou, M., Moshtaghi, M., Leckie, C., & Salehi, M. (2018). Online clustering for evolving data streams with online anomaly detection. In D. Phung, G. I. Webb, B. Ho, M. Ganji, & L. Rashidi (Eds.), Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018 - Proceedings, Part II (pp. 508-521). (Lecture Notes in Artificial Intelligence; Vol. 10938). Cham Switzerland: Springer. https://doi.org/10.1007/978-3-319-93037-4_40
    Chenaghlou, Milad ; Moshtaghi, Masud ; Leckie, Christopher ; Salehi, Mahsa. / Online clustering for evolving data streams with online anomaly detection. Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018 - Proceedings, Part II. editor / Dinh Phung ; Geoffrey I. Webb ; Bao Ho ; Mohadeseh Ganji ; Lida Rashidi. Cham Switzerland : Springer, 2018. pp. 508-521 (Lecture Notes in Artificial Intelligence).
    @inproceedings{fd8c61e2dd244a7281612bdbdf97fd77,
    title = "Online clustering for evolving data streams with online anomaly detection",
    abstract = "Clustering data streams is an emerging challenge with a wide range of applications in areas including Wireless Sensor Networks, the Internet of Things, finance and social media. In an evolving data stream, a clustering algorithm is desired to both (a) assign observations to clusters and (b) identify anomalies in real-time. Current state-of-the-art algorithms in the literature do not address feature (b) as they only consider the spatial proximity of data, which results in (1) poor clustering and (2) poor demonstration of the temporal evolution of data in noisy environments. In this paper, we propose an online clustering algorithm that considers the temporal proximity of observations as well as their spatial proximity to identify anomalies in real-time. It identifies the evolution of clusters in noisy streams, incrementally updates the model and calculates the minimum window length over the evolving data stream without jeopardizing performance. To the best of our knowledge, this is the first online clustering algorithm that identifies anomalies in real-time and discovers the temporal evolution of clusters. Our contributions are supported by synthetic as well as real-world data experiments.",
    author = "Milad Chenaghlou and Masud Moshtaghi and Christopher Leckie and Mahsa Salehi",
    year = "2018",
    doi = "10.1007/978-3-319-93037-4_40",
    language = "English",
    isbn = "9783319930367",
    series = "Lecture Notes in Artificial Intelligence",
    publisher = "Springer",
    pages = "508--521",
    editor = "Dinh Phung and Webb, {Geoffrey I.} and Bao Ho and Mohadeseh Ganji and Lida Rashidi",
    booktitle = "Advances in Knowledge Discovery and Data Mining",

    }

    Chenaghlou, M, Moshtaghi, M, Leckie, C & Salehi, M 2018, Online clustering for evolving data streams with online anomaly detection. in D Phung, GI Webb, B Ho, M Ganji & L Rashidi (eds), Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018 - Proceedings, Part II. Lecture Notes in Artificial Intelligence, vol. 10938, Springer, Cham Switzerland, pp. 508-521, Pacific-Asia Conference on Knowledge Discovery and Data Mining 2018, Melbourne, Australia, 3/06/18. https://doi.org/10.1007/978-3-319-93037-4_40

    Online clustering for evolving data streams with online anomaly detection. / Chenaghlou, Milad; Moshtaghi, Masud; Leckie, Christopher; Salehi, Mahsa.

    Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018 - Proceedings, Part II. ed. / Dinh Phung; Geoffrey I. Webb; Bao Ho; Mohadeseh Ganji; Lida Rashidi. Cham Switzerland : Springer, 2018. p. 508-521 (Lecture Notes in Artificial Intelligence; Vol. 10938).

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    TY - GEN

    T1 - Online clustering for evolving data streams with online anomaly detection

    AU - Chenaghlou, Milad

    AU - Moshtaghi, Masud

    AU - Leckie, Christopher

    AU - Salehi, Mahsa

    PY - 2018

    Y1 - 2018

    N2 - Clustering data streams is an emerging challenge with a wide range of applications in areas including Wireless Sensor Networks, the Internet of Things, finance and social media. In an evolving data stream, a clustering algorithm is desired to both (a) assign observations to clusters and (b) identify anomalies in real-time. Current state-of-the-art algorithms in the literature do not address feature (b) as they only consider the spatial proximity of data, which results in (1) poor clustering and (2) poor demonstration of the temporal evolution of data in noisy environments. In this paper, we propose an online clustering algorithm that considers the temporal proximity of observations as well as their spatial proximity to identify anomalies in real-time. It identifies the evolution of clusters in noisy streams, incrementally updates the model and calculates the minimum window length over the evolving data stream without jeopardizing performance. To the best of our knowledge, this is the first online clustering algorithm that identifies anomalies in real-time and discovers the temporal evolution of clusters. Our contributions are supported by synthetic as well as real-world data experiments.

    AB - Clustering data streams is an emerging challenge with a wide range of applications in areas including Wireless Sensor Networks, the Internet of Things, finance and social media. In an evolving data stream, a clustering algorithm is desired to both (a) assign observations to clusters and (b) identify anomalies in real-time. Current state-of-the-art algorithms in the literature do not address feature (b) as they only consider the spatial proximity of data, which results in (1) poor clustering and (2) poor demonstration of the temporal evolution of data in noisy environments. In this paper, we propose an online clustering algorithm that considers the temporal proximity of observations as well as their spatial proximity to identify anomalies in real-time. It identifies the evolution of clusters in noisy streams, incrementally updates the model and calculates the minimum window length over the evolving data stream without jeopardizing performance. To the best of our knowledge, this is the first online clustering algorithm that identifies anomalies in real-time and discovers the temporal evolution of clusters. Our contributions are supported by synthetic as well as real-world data experiments.

    UR - http://www.scopus.com/inward/record.url?scp=85049369886&partnerID=8YFLogxK

    U2 - 10.1007/978-3-319-93037-4_40

    DO - 10.1007/978-3-319-93037-4_40

    M3 - Conference Paper

    SN - 9783319930367

    T3 - Lecture Notes in Artificial Intelligence

    SP - 508

    EP - 521

    BT - Advances in Knowledge Discovery and Data Mining

    A2 - Phung, Dinh

    A2 - Webb, Geoffrey I.

    A2 - Ho, Bao

    A2 - Ganji, Mohadeseh

    A2 - Rashidi, Lida

    PB - Springer

    CY - Cham Switzerland

    ER -

    Chenaghlou M, Moshtaghi M, Leckie C, Salehi M. Online clustering for evolving data streams with online anomaly detection. In Phung D, Webb GI, Ho B, Ganji M, Rashidi L, editors, Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018 - Proceedings, Part II. Cham Switzerland: Springer. 2018. p. 508-521. (Lecture Notes in Artificial Intelligence). https://doi.org/10.1007/978-3-319-93037-4_40