PAnDE: Averaged n-dependence estimators for positive unlabeled learning

Fuyi Li, Jiangning Song, Chen Li, Tatsuya Akutsu, Yang Zhang

Research output: Contribution to journalArticleResearchpeer-review

7 Citations (Scopus)

Abstract

Traditional data mining algorithms are commonly based on fully labeled data, which is often practically difficult to obtain. In recent years, positive unlabeled (PU) learning has emerged as a useful technique to address this issue, which allows algorithms to learn from only positive and unlabeled data by relaxing the requirement for obtaining fully labeled data. Existing PU learning algorithms based on Bayesian classifiers, including PNB and PAODE, have been successfully applied to multiple classification problems. However, their empirical performance is affected by the attribute independence assumption. With the goal of effectively tackling positive unlabeled learning tasks with higher-level attribute dependence, we propose a novel PU learning algorithm in this study, termed PAnDE, which extends the AnDE (Averaged n-Dependence Estimators) algorithm based on the ‘selected completely at random’ assumption. We performed benchmarking tests to compare the performance of PAnDE with PNB (based on Naive Bayes) and PAODE (based on the Averaged One-Dependence Estimators) on 20 UCI datasets and three other real-world (human protein glycosylation) datasets. The results demonstrate that PAnDE outperformed PNB and PAODE, highlighting the predictive power of PAnDE and its scalability in a range of real-world applications.

Original languageEnglish
Pages (from-to)1287-1297
Number of pages11
JournalICIC Express Letters, Part B: Applications
Volume8
Issue number9
Publication statusPublished - Sept 2017

Keywords

  • Averaged n-dependence estimators
  • Bayesian classification
  • PAODE
  • PNB
  • Positive unlabeled learning

Cite this