Abstract
The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common among the individual channel anomalies and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single-channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large-scale data stream analysis.
Original language | English |
---|---|
Pages (from-to) | 33-59 |
Number of pages | 27 |
Journal | Knowledge and Information Systems |
Volume | 35 |
Issue number | 1 |
DOIs | |
Publication status | Published - Apr 2013 |
Externally published | Yes |
Keywords
- Anomaly detection
- Collaborative subspace learning
- Data mining
- Multiple channels
- Residual subspace analysis
- Text data analysis
- Topic modeling
- Video surveillance