OcVFDT: One-class very fast decision tree for one-class classification of data streams

Chen Li, Yang Zhang, Xue Li

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

48 Citations (Scopus)

Abstract

Current research on data stream classification mainly focuses on supervised learning, in which a fully labeled data stream is needed for training. However, fully labeled data streams are expensive to obtain, which make the supervised learning approach difficult to be applied to real-life applications. In this paper, we model applications, such as credit fraud detection and intrusion detection, as a one-class data stream classification problem. The cost of fully labeling the data stream is reduced as users only need to provide some positive samples together with the unlabeled samples to the learner. Based on VFDT and POSC4.5, we propose our OcVFDT (One-class Very Fast Decision Tree) algorithm. Experimental study on both synthetic and real-life datasets shows that the OcVFDT has excellent classification performance. Even 80% of the samples in data stream are unlabeled, the classification performance of OcVFDT is still very close to that of VFDT, which is trained on fully labeled stream.

Original languageEnglish
Title of host publicationProceedings of the 3rd International Workshop on Knowledge Discovery from Sensor Data, SensorKDD'09 in Conjunction with the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD-09
PublisherAssociation for Computing Machinery (ACM)
Pages79-86
Number of pages8
ISBN (Print)9781605586687
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event3rd International Workshop on Knowledge Discovery from Sensor Data - Paris, France
Duration: 28 Jun 200928 Jun 2009
Conference number: 3

Conference

Conference3rd International Workshop on Knowledge Discovery from Sensor Data
Country/TerritoryFrance
CityParis
Period28/06/0928/06/09
OtherHeld in Conjunction with the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD-09

Cite this