Non-redundant subgroup discovery using a closure system

Mario Boley, Henrik Grosskreutz

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

22 Citations (Scopus)

Abstract

Subgroup discovery is a local pattern discovery task, in which descriptions of subpopulations of a database are evaluated against some quality function. As standard quality functions are functions of the described subpopulation, we propose to search for equivalence classes of descriptions with respect to their extension in the database rather than individual descriptions. These equivalence classes have unique maximal representatives forming a closure system. We show that minimum cardinality representatives of each equivalence class can be found during the enumeration process of that closure system without additional cost, while finding a minimum representative of a single equivalence class is NP-hard. With several real-world datasets we demonstrate that search space and output are significantly reduced by considering equivalence classes instead of individual descriptions and that the minimum representatives constitute a family of subgroup descriptions that is of same or better expressive power than those generated by traditional methods.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationEuropean Conference, ECML PKDD 2009 Bled, Slovenia, September 7-11, 2009 Proceedings, Part I
EditorsWray Buntine, Marko Grobelnik, Dunja Mladeni´, John Shawe-Taylor
Place of PublicationBerlin Germany
PublisherSpringer
Pages179-194
Number of pages16
ISBN (Print)3642041795, 9783642041792
DOIs
Publication statusPublished - 2009
Externally publishedYes
EventEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases 2009 - Bled, Slovenia
Duration: 7 Sep 200911 Sep 2009
http://www.ecmlpkdd2009.net/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume5781
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases 2009
Abbreviated titleECML PKDD 2009
CountrySlovenia
CityBled
Period7/09/0911/09/09
Internet address

Cite this

Boley, M., & Grosskreutz, H. (2009). Non-redundant subgroup discovery using a closure system. In W. Buntine, M. Grobelnik, D. Mladeni´, & J. Shawe-Taylor (Eds.), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009 Bled, Slovenia, September 7-11, 2009 Proceedings, Part I (pp. 179-194). (Lecture Notes in Computer Science; Vol. 5781 ). Springer. https://doi.org/10.1007/978-3-642-04180-8_29