Dynamic classifier ensemble for positive unlabeled text stream classification

Shirui Pan, Yang Zhang, Xue Li

Research output: Contribution to journalArticleResearchpeer-review

16 Citations (Scopus)

Abstract

Most of studies on streaming data classification are based on the assumption that data can be fully labeled. However, in real-life applications, it is impractical and time-consuming to manually label the entire stream for training. It is very common that only a small part of positive data and a large amount of unlabeled data are available in data stream environments. In this case, applying the traditional streaming algorithms with straightforward adaptation to positive unlabeled stream may not work well or lead to poor performance. In this paper, we propose a Dynamic Classifier Ensemble method for Positive and Unlabeled text stream (DCEPU) classification scenarios. We address the problem of classifying positive and unlabeled text stream with various concept drift by constructing an appropriate validation set and designing a novel dynamic weighting scheme in the classification phase. Experimental results on benchmark dataset RCV1-v2 demonstrate that the proposed method DCEPU outperforms the existing LELC (Li et al. 2009b), DVS (with necessary adaption) (Tsymbal et al. in Inf Fusion 9(1):56-68, 2008), and Stacking style ensemble-based algorithm (Zhang et al. 2008b).

Original languageEnglish
Pages (from-to)267-287
Number of pages21
JournalKnowledge and Information Systems
Volume33
Issue number2
DOIs
Publication statusPublished - 1 Nov 2012

Keywords

  • Classifier ensemble
  • Concept drift
  • Positive unlabeled learning
  • Text streams

Cite this