Abstract
This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.
Original language | English |
---|---|
Title of host publication | Principles of Data Mining and Knowledge Discovery |
Subtitle of host publication | 6th European Conference, PKDD 2002 Helsinki, Finland, August 19-23, 2002 Proceedings |
Editors | Tapio Elomaa, Heikki Mannila, Hannu Toivonen |
Place of Publication | Berlin Germany |
Publisher | Springer |
Pages | 62-73 |
Number of pages | 12 |
ISBN (Print) | 3540440372 |
DOIs | |
Publication status | Published - 2002 |
Externally published | Yes |
Event | European Conference on Principles and Practice of Knowledge Discovery in Databases 2002 - Helsinki, Finland Duration: 19 Aug 2002 → 23 Aug 2002 Conference number: 6th https://link.springer.com/book/10.1007/3-540-45681-3 (Proceedings) |
Publication series
Name | Lecture Notes in Artificial Intelligence |
---|---|
Publisher | Springer |
Volume | 2431 |
ISSN (Print) | 0302-9743 |
Conference
Conference | European Conference on Principles and Practice of Knowledge Discovery in Databases 2002 |
---|---|
Abbreviated title | PKDD 2002 |
Country/Territory | Finland |
City | Helsinki |
Period | 19/08/02 → 23/08/02 |
Internet address |
|