Instance spaces for machine learning classification

Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, Kate Smith-Miles

Research output: Contribution to journalArticleResearchpeer-review

Abstract

This paper tackles the issue of objective performance evaluation of machine learning classifiers, and the impact of the choice of test instances. Given that statistical properties or features of a dataset affect the difficulty of an instance for particular classification algorithms, we examine the diversity and quality of the UCI repository of test instances used by most machine learning researchers. We show how an instance space can be visualized, with each classification dataset represented as a point in the space. The instance space is constructed to reveal pockets of hard and easy instances, and enables the strengths and weaknesses of individual classifiers to be identified. Finally, we propose a methodology to generate new test instances with the aim of enriching the diversity of the instance space, enabling potentially greater insights than can be afforded by the current UCI repository.

Original languageEnglish
Pages (from-to)109-147
Number of pages39
JournalMachine Learning
Volume107
Issue number1
DOIs
Publication statusPublished - 1 Jan 2018

Keywords

  • Algorithm footprints
  • Classification
  • Instance difficulty
  • Instance space
  • Meta-learning
  • Performance evaluation
  • Test data
  • Test instance generation

Cite this

Muñoz, Mario A. ; Villanova, Laura ; Baatar, Davaatseren ; Smith-Miles, Kate. / Instance spaces for machine learning classification. In: Machine Learning. 2018 ; Vol. 107, No. 1. pp. 109-147.
@article{f10bbfd4eeac465895a5c8bc0a8f8f9b,
title = "Instance spaces for machine learning classification",
abstract = "This paper tackles the issue of objective performance evaluation of machine learning classifiers, and the impact of the choice of test instances. Given that statistical properties or features of a dataset affect the difficulty of an instance for particular classification algorithms, we examine the diversity and quality of the UCI repository of test instances used by most machine learning researchers. We show how an instance space can be visualized, with each classification dataset represented as a point in the space. The instance space is constructed to reveal pockets of hard and easy instances, and enables the strengths and weaknesses of individual classifiers to be identified. Finally, we propose a methodology to generate new test instances with the aim of enriching the diversity of the instance space, enabling potentially greater insights than can be afforded by the current UCI repository.",
keywords = "Algorithm footprints, Classification, Instance difficulty, Instance space, Meta-learning, Performance evaluation, Test data, Test instance generation",
author = "Mu{\~n}oz, {Mario A.} and Laura Villanova and Davaatseren Baatar and Kate Smith-Miles",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/s10994-017-5629-5",
language = "English",
volume = "107",
pages = "109--147",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer",
number = "1",

}

Muñoz, MA, Villanova, L, Baatar, D & Smith-Miles, K 2018, 'Instance spaces for machine learning classification' Machine Learning, vol. 107, no. 1, pp. 109-147. https://doi.org/10.1007/s10994-017-5629-5

Instance spaces for machine learning classification. / Muñoz, Mario A.; Villanova, Laura; Baatar, Davaatseren; Smith-Miles, Kate.

In: Machine Learning, Vol. 107, No. 1, 01.01.2018, p. 109-147.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Instance spaces for machine learning classification

AU - Muñoz, Mario A.

AU - Villanova, Laura

AU - Baatar, Davaatseren

AU - Smith-Miles, Kate

PY - 2018/1/1

Y1 - 2018/1/1

N2 - This paper tackles the issue of objective performance evaluation of machine learning classifiers, and the impact of the choice of test instances. Given that statistical properties or features of a dataset affect the difficulty of an instance for particular classification algorithms, we examine the diversity and quality of the UCI repository of test instances used by most machine learning researchers. We show how an instance space can be visualized, with each classification dataset represented as a point in the space. The instance space is constructed to reveal pockets of hard and easy instances, and enables the strengths and weaknesses of individual classifiers to be identified. Finally, we propose a methodology to generate new test instances with the aim of enriching the diversity of the instance space, enabling potentially greater insights than can be afforded by the current UCI repository.

AB - This paper tackles the issue of objective performance evaluation of machine learning classifiers, and the impact of the choice of test instances. Given that statistical properties or features of a dataset affect the difficulty of an instance for particular classification algorithms, we examine the diversity and quality of the UCI repository of test instances used by most machine learning researchers. We show how an instance space can be visualized, with each classification dataset represented as a point in the space. The instance space is constructed to reveal pockets of hard and easy instances, and enables the strengths and weaknesses of individual classifiers to be identified. Finally, we propose a methodology to generate new test instances with the aim of enriching the diversity of the instance space, enabling potentially greater insights than can be afforded by the current UCI repository.

KW - Algorithm footprints

KW - Classification

KW - Instance difficulty

KW - Instance space

KW - Meta-learning

KW - Performance evaluation

KW - Test data

KW - Test instance generation

UR - http://www.scopus.com/inward/record.url?scp=85026873562&partnerID=8YFLogxK

U2 - 10.1007/s10994-017-5629-5

DO - 10.1007/s10994-017-5629-5

M3 - Article

VL - 107

SP - 109

EP - 147

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1

ER -