Generating custom classification datasets by targeting the instance space

Mario A. Muñoz, Kate Smith-Miles

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

6 Citations (Scopus)


While machine learning has evolved at a fast pace in the last decades, the testing procedure of new methods may be not keeping pace. It often relies on well-studied collections of classification datasets such as the UCI repository. However, a meta-Analysis through features has showed that most datasets from UCI are not suffciently challenging to expose unique weaknesses of algorithms. In this paper we present a method to generate datasets with continuous, binary and categorical attributes, through the fitting of a Gaussian Mixture Model and a set of generalized Bernoulli distributions. By targeting empty areas of the instance space, this method has the potential to generate datasets with more diverse feature values.i.

Original languageEnglish
Title of host publicationProceedings of the Genetic and Evolutionary Computation Conference Companion
Subtitle of host publication2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017; Berlin; Germany; 15 July 2017 through 19 July 2017; Code 128763
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages7
ISBN (Electronic)9781450349390
Publication statusPublished - 15 Jul 2017
EventThe Genetic and Evolutionary Computation Conference 2017 - Berlin, Germany
Duration: 15 Jul 201719 Jul 2017
Conference number: 19th (Proceedings)


ConferenceThe Genetic and Evolutionary Computation Conference 2017
Abbreviated titleGECCO 2017
OtherA Recombination of the 26th International Conference on Genetic Algorithms (ICGA) and the 22nd Annual Genetic Programming Conference (GP).

The Genetic and Evolutionary Computation Conference (GECCO) presents the latest high-quality results in genetic and evolutionary computation since 1999. Topics include: genetic algorithms,genetic programming, ant colony optimization and swarm intelligence, complex systems (artificiallife/robotics/evolvable hardware/generative and developmental systems/artificial immune systems), digital entertainment technologies and arts, evolutionary combinatorial optimization and metaheuristics, evolutionary machine learning, evolutionary multiobjective optimization, evolutionary numerical optimization, real world applications, search-based software engineering, theory and more.
Internet address

Cite this