Abstract
Current research on data stream classification mainly focuses on supervised learning, in which a fully labeled data stream is needed for training. However, fully labeled data streams are expensive to obtain, which make the supervised learning approach difficult to be applied to real-life applications. In this paper, we model applications, such as credit fraud detection and intrusion detection, as a one-class data stream classification problem. The cost of fully labeling the data stream is reduced as users only need to provide some positive samples together with the unlabeled samples to the learner. Based on VFDT and POSC4.5, we propose our OcVFDT (One-class Very Fast Decision Tree) algorithm. Experimental study on both synthetic and real-life datasets shows that the OcVFDT has excellent classification performance. Even 80% of the samples in data stream are unlabeled, the classification performance of OcVFDT is still very close to that of VFDT, which is trained on fully labeled stream.
Original language | English |
---|---|
Title of host publication | Proceedings of the 3rd International Workshop on Knowledge Discovery from Sensor Data, SensorKDD'09 in Conjunction with the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD-09 |
Publisher | Association for Computing Machinery (ACM) |
Pages | 79-86 |
Number of pages | 8 |
ISBN (Print) | 9781605586687 |
DOIs | |
Publication status | Published - 2009 |
Externally published | Yes |
Event | 3rd International Workshop on Knowledge Discovery from Sensor Data - Paris, France Duration: 28 Jun 2009 → 28 Jun 2009 Conference number: 3 |
Conference
Conference | 3rd International Workshop on Knowledge Discovery from Sensor Data |
---|---|
Country/Territory | France |
City | Paris |
Period | 28/06/09 → 28/06/09 |
Other | Held in Conjunction with the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD-09 |