A survey on anomaly detection in evolving data [with application to forest fire risk prediction]

Mahsa Salehi, Lida Rashidi

Research output: Contribution to journalArticleResearch

Abstract

Traditionally most of the anomaly detection algorithms have been designed for 'static' datasets, in which all the observations are available at one time. In non-stationary environments on the other hand, the same algorithms cannot be applied as the underlying data distributions change constantly and the same models are not valid. Hence, we need to devise adaptive models that take into account the dynamically changing characteristics of environments and detect anomalies in 'evolving' data. Over the last two decades, many algorithms have been proposed to detect anomalies in evolving data. Some of them consider scenarios where a sequence of objects (called data streams) with one or multiple features evolves over time. Whereas the others concentrate on more complex scenarios, where streaming objects with one or multiple features have causal/non-causal relationships with each other. The latter can be represented as evolving graphs. In this paper, we categorize existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques. Since label information is mostly unavailable in real-world applications when data evolves, we review the unsupervised approaches in this paper. We then present an interesting application example, i.e., forest re risk prediction, and conclude the paper with future research directions in this eld for researchers and industry.
Original languageEnglish
Pages (from-to)13-23
Number of pages11
JournalSIGKDD Explorations
Volume20
Issue number1
DOIs
Publication statusPublished - Jun 2018

Cite this

@article{06d86abcd8d54f0ea647780105353168,
title = "A survey on anomaly detection in evolving data [with application to forest fire risk prediction]",
abstract = "Traditionally most of the anomaly detection algorithms have been designed for 'static' datasets, in which all the observations are available at one time. In non-stationary environments on the other hand, the same algorithms cannot be applied as the underlying data distributions change constantly and the same models are not valid. Hence, we need to devise adaptive models that take into account the dynamically changing characteristics of environments and detect anomalies in 'evolving' data. Over the last two decades, many algorithms have been proposed to detect anomalies in evolving data. Some of them consider scenarios where a sequence of objects (called data streams) with one or multiple features evolves over time. Whereas the others concentrate on more complex scenarios, where streaming objects with one or multiple features have causal/non-causal relationships with each other. The latter can be represented as evolving graphs. In this paper, we categorize existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques. Since label information is mostly unavailable in real-world applications when data evolves, we review the unsupervised approaches in this paper. We then present an interesting application example, i.e., forest re risk prediction, and conclude the paper with future research directions in this eld for researchers and industry.",
author = "Mahsa Salehi and Lida Rashidi",
year = "2018",
month = "6",
doi = "10.1145/3229329.3229332",
language = "English",
volume = "20",
pages = "13--23",
journal = "SIGKDD Explorations",
issn = "1931-0145",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

A survey on anomaly detection in evolving data [with application to forest fire risk prediction]. / Salehi, Mahsa; Rashidi, Lida.

In: SIGKDD Explorations, Vol. 20, No. 1, 06.2018, p. 13-23.

Research output: Contribution to journalArticleResearch

TY - JOUR

T1 - A survey on anomaly detection in evolving data [with application to forest fire risk prediction]

AU - Salehi, Mahsa

AU - Rashidi, Lida

PY - 2018/6

Y1 - 2018/6

N2 - Traditionally most of the anomaly detection algorithms have been designed for 'static' datasets, in which all the observations are available at one time. In non-stationary environments on the other hand, the same algorithms cannot be applied as the underlying data distributions change constantly and the same models are not valid. Hence, we need to devise adaptive models that take into account the dynamically changing characteristics of environments and detect anomalies in 'evolving' data. Over the last two decades, many algorithms have been proposed to detect anomalies in evolving data. Some of them consider scenarios where a sequence of objects (called data streams) with one or multiple features evolves over time. Whereas the others concentrate on more complex scenarios, where streaming objects with one or multiple features have causal/non-causal relationships with each other. The latter can be represented as evolving graphs. In this paper, we categorize existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques. Since label information is mostly unavailable in real-world applications when data evolves, we review the unsupervised approaches in this paper. We then present an interesting application example, i.e., forest re risk prediction, and conclude the paper with future research directions in this eld for researchers and industry.

AB - Traditionally most of the anomaly detection algorithms have been designed for 'static' datasets, in which all the observations are available at one time. In non-stationary environments on the other hand, the same algorithms cannot be applied as the underlying data distributions change constantly and the same models are not valid. Hence, we need to devise adaptive models that take into account the dynamically changing characteristics of environments and detect anomalies in 'evolving' data. Over the last two decades, many algorithms have been proposed to detect anomalies in evolving data. Some of them consider scenarios where a sequence of objects (called data streams) with one or multiple features evolves over time. Whereas the others concentrate on more complex scenarios, where streaming objects with one or multiple features have causal/non-causal relationships with each other. The latter can be represented as evolving graphs. In this paper, we categorize existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques. Since label information is mostly unavailable in real-world applications when data evolves, we review the unsupervised approaches in this paper. We then present an interesting application example, i.e., forest re risk prediction, and conclude the paper with future research directions in this eld for researchers and industry.

U2 - 10.1145/3229329.3229332

DO - 10.1145/3229329.3229332

M3 - Article

VL - 20

SP - 13

EP - 23

JO - SIGKDD Explorations

JF - SIGKDD Explorations

SN - 1931-0145

IS - 1

ER -