Direct pattern sampling with respect to pattern frequency

Mario Boley, Claudio Lucchese, Daniel Paurat, Thomas Gärtner

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

We present an exact and highly scalable sampling algorithm that can be used as an alternative to exhaustive local pattern discovery methods. It samples patterns according to their frequency of occurrence and can substantially improve efficiency and controllability of the pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedure is direct, i.e. a non process-simulating sampling algorithm. The advantages of this direct method are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. In addition we present experimental results which demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability. An extended version of this paper shows modifications of the here presented algorithm to sample by other frequency related distributions. Namely, area, squared frequency and a class discriminativity measure.

Original languageEnglish
Title of host publicationLWA 2011
Subtitle of host publicationTechnical Report of the Symposium "Lernen, Wissen, Adaptivitat - Learning, Knowledge, and Adaptivity 2011" of the GI Special Interest Groups KDML, IR and WM 2011
Place of PublicationGermany
PublisherFakultät für Informatik
Pages114-121
Number of pages8
Publication statusPublished - 2011
Externally publishedYes
EventSymposium on Learning, Knowledge, and Adaptivity 2011, LWA 2011 - Magdeburg, Germany
Duration: 28 Sep 201130 Sep 2011

Conference

ConferenceSymposium on Learning, Knowledge, and Adaptivity 2011, LWA 2011
Country/TerritoryGermany
CityMagdeburg
Period28/09/1130/09/11

Cite this