The need for low bias algorithms in classification learning from large data sets

Damien Brain, Geoffrey Ian Webb

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

44 Citations (Scopus)


This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.
Original languageEnglish
Title of host publicationPrinciples of Data Mining and Knowledge Discovery
Subtitle of host publication6th European Conference, PKDD 2002 Helsinki, Finland, August 19-23, 2002 Proceedings
EditorsTapio Elomaa, Heikki Mannila, Hannu Toivonen
Place of PublicationBerlin Germany
Number of pages12
ISBN (Print)3540440372
Publication statusPublished - 2002
Externally publishedYes
EventEuropean Conference on Principles and Practice of Knowledge Discovery in Databases 2002 - Helsinki, Finland
Duration: 19 Aug 200223 Aug 2002
Conference number: 6th (Proceedings)

Publication series

NameLecture Notes in Artificial Intelligence
ISSN (Print)0302-9743


ConferenceEuropean Conference on Principles and Practice of Knowledge Discovery in Databases 2002
Abbreviated titlePKDD 2002
Internet address

Cite this