Projects per year
Abstract
More and more applications have come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers.
Original language | English |
---|---|
Article number | 7565579 |
Pages (from-to) | 172-185 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 29 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2017 |
Keywords
- Bayesian network classifiers
- Large data
- Classification learning
- Attribute selection
- Averaged n-dependence estimators (AnDE)
- Leave-one-out cross validation
Projects
- 1 Finished
-
Combining generative and discriminative strategies to facilitate efficient and effective learning from big data
Webb, G. (Primary Chief Investigator (PCI))
Australian Research Council (ARC), Monash University
2/01/14 → 31/12/16
Project: Research
Equipment
-
Monash eResearch
Powell, D. (Manager)
Office of the Vice-Provost (Research and Research Infrastructure)Facility/equipment: Facility