Generating custom classification datasets by targeting the instance space

Mario A. Muñoz, Kate Smith-Miles

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)

Abstract

While machine learning has evolved at a fast pace in the last decades, the testing procedure of new methods may be not keeping pace. It often relies on well-studied collections of classification datasets such as the UCI repository. However, a meta-Analysis through features has showed that most datasets from UCI are not suffciently challenging to expose unique weaknesses of algorithms. In this paper we present a method to generate datasets with continuous, binary and categorical attributes, through the fitting of a Gaussian Mixture Model and a set of generalized Bernoulli distributions. By targeting empty areas of the instance space, this method has the potential to generate datasets with more diverse feature values.i.

Original languageEnglish
Title of host publicationProceedings of the Genetic and Evolutionary Computation Conference Companion
Subtitle of host publication2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017; Berlin; Germany; 15 July 2017 through 19 July 2017; Code 128763
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages1582-1588
Number of pages7
ISBN (Electronic)9781450349390
DOIs
Publication statusPublished - 15 Jul 2017
EventThe Genetic and Evolutionary Computation Conference 2017 - Berlin, Germany
Duration: 15 Jul 201719 Jul 2017
http://gecco-2017.sigevo.org/index.html/HomePage.html

Conference

ConferenceThe Genetic and Evolutionary Computation Conference 2017
Abbreviated titleGECCO 2017
CountryGermany
CityBerlin
Period15/07/1719/07/17
OtherA Recombination of the 26th International Conference on Genetic Algorithms (ICGA) and the 22nd Annual Genetic Programming Conference (GP).

The Genetic and Evolutionary Computation Conference (GECCO) presents the latest high-quality results in genetic and evolutionary computation since 1999. Topics include: genetic algorithms,genetic programming, ant colony optimization and swarm intelligence, complex systems (artificiallife/robotics/evolvable hardware/generative and developmental systems/artificial immune systems), digital entertainment technologies and arts, evolutionary combinatorial optimization and metaheuristics, evolutionary machine learning, evolutionary multiobjective optimization, evolutionary numerical optimization, real world applications, search-based software engineering, theory and more.
Internet address

Cite this

Muñoz, M. A., & Smith-Miles, K. (2017). Generating custom classification datasets by targeting the instance space. In Proceedings of the Genetic and Evolutionary Computation Conference Companion: 2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017; Berlin; Germany; 15 July 2017 through 19 July 2017; Code 128763 (pp. 1582-1588). New York NY USA: Association for Computing Machinery (ACM). https://doi.org/10.1145/3067695.3082532
Muñoz, Mario A. ; Smith-Miles, Kate. / Generating custom classification datasets by targeting the instance space. Proceedings of the Genetic and Evolutionary Computation Conference Companion: 2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017; Berlin; Germany; 15 July 2017 through 19 July 2017; Code 128763. New York NY USA : Association for Computing Machinery (ACM), 2017. pp. 1582-1588
@inproceedings{aa945fbf44cd44698ca56513a0cc3221,
title = "Generating custom classification datasets by targeting the instance space",
abstract = "While machine learning has evolved at a fast pace in the last decades, the testing procedure of new methods may be not keeping pace. It often relies on well-studied collections of classification datasets such as the UCI repository. However, a meta-Analysis through features has showed that most datasets from UCI are not suffciently challenging to expose unique weaknesses of algorithms. In this paper we present a method to generate datasets with continuous, binary and categorical attributes, through the fitting of a Gaussian Mixture Model and a set of generalized Bernoulli distributions. By targeting empty areas of the instance space, this method has the potential to generate datasets with more diverse feature values.i.",
author = "Mu{\~n}oz, {Mario A.} and Kate Smith-Miles",
year = "2017",
month = "7",
day = "15",
doi = "10.1145/3067695.3082532",
language = "English",
pages = "1582--1588",
booktitle = "Proceedings of the Genetic and Evolutionary Computation Conference Companion",
publisher = "Association for Computing Machinery (ACM)",
address = "United States of America",

}

Muñoz, MA & Smith-Miles, K 2017, Generating custom classification datasets by targeting the instance space. in Proceedings of the Genetic and Evolutionary Computation Conference Companion: 2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017; Berlin; Germany; 15 July 2017 through 19 July 2017; Code 128763. Association for Computing Machinery (ACM), New York NY USA, pp. 1582-1588, The Genetic and Evolutionary Computation Conference 2017, Berlin, Germany, 15/07/17. https://doi.org/10.1145/3067695.3082532

Generating custom classification datasets by targeting the instance space. / Muñoz, Mario A.; Smith-Miles, Kate.

Proceedings of the Genetic and Evolutionary Computation Conference Companion: 2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017; Berlin; Germany; 15 July 2017 through 19 July 2017; Code 128763. New York NY USA : Association for Computing Machinery (ACM), 2017. p. 1582-1588.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Generating custom classification datasets by targeting the instance space

AU - Muñoz, Mario A.

AU - Smith-Miles, Kate

PY - 2017/7/15

Y1 - 2017/7/15

N2 - While machine learning has evolved at a fast pace in the last decades, the testing procedure of new methods may be not keeping pace. It often relies on well-studied collections of classification datasets such as the UCI repository. However, a meta-Analysis through features has showed that most datasets from UCI are not suffciently challenging to expose unique weaknesses of algorithms. In this paper we present a method to generate datasets with continuous, binary and categorical attributes, through the fitting of a Gaussian Mixture Model and a set of generalized Bernoulli distributions. By targeting empty areas of the instance space, this method has the potential to generate datasets with more diverse feature values.i.

AB - While machine learning has evolved at a fast pace in the last decades, the testing procedure of new methods may be not keeping pace. It often relies on well-studied collections of classification datasets such as the UCI repository. However, a meta-Analysis through features has showed that most datasets from UCI are not suffciently challenging to expose unique weaknesses of algorithms. In this paper we present a method to generate datasets with continuous, binary and categorical attributes, through the fitting of a Gaussian Mixture Model and a set of generalized Bernoulli distributions. By targeting empty areas of the instance space, this method has the potential to generate datasets with more diverse feature values.i.

UR - http://www.scopus.com/inward/record.url?scp=85026874144&partnerID=8YFLogxK

U2 - 10.1145/3067695.3082532

DO - 10.1145/3067695.3082532

M3 - Conference Paper

SP - 1582

EP - 1588

BT - Proceedings of the Genetic and Evolutionary Computation Conference Companion

PB - Association for Computing Machinery (ACM)

CY - New York NY USA

ER -

Muñoz MA, Smith-Miles K. Generating custom classification datasets by targeting the instance space. In Proceedings of the Genetic and Evolutionary Computation Conference Companion: 2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017; Berlin; Germany; 15 July 2017 through 19 July 2017; Code 128763. New York NY USA: Association for Computing Machinery (ACM). 2017. p. 1582-1588 https://doi.org/10.1145/3067695.3082532