Linear space direct pattern sampling using coupling from the past

Mario Boley, Sandy Moens, Thomas Gärtner

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

22 Citations (Scopus)

Abstract

This paper shows how coupling from the past (CFTP) can be used to avoid time and memory bottlenecks in direct local pattern sampling procedures. Such procedures draw controlled amounts of suitably biased samples directly from the pattern space of a given dataset in polynomial time. Previous direct pattern sampling methods can produce patterns in rapid succession after some initial preprocessing phase. This preprocessing phase, however, turns out to be prohibitive in terms of time and memory for many datasets. We show how CFTP can be used to avoid any super-linear preprocessing and memory requirements. This allows to simulate more complex distributions, which previously were intractable. We show for a large number of public real-world datasets that these new algorithms are fast to execute and their pattern collections outperform previous approaches both in unsupervised as well as supervised contexts.

Original languageEnglish
Title of host publicationKDD'12 - Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Place of PublicationPiscataway NJ USA
PublisherAssociation for Computing Machinery (ACM)
Pages69-77
Number of pages9
ISBN (Print)9781450314626
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventACM International Conference on Knowledge Discovery and Data Mining 2012 - Beijing, China
Duration: 12 Aug 201216 Aug 2012
Conference number: 18th
https://dl.acm.org/doi/proceedings/10.1145/2339530

Conference

ConferenceACM International Conference on Knowledge Discovery and Data Mining 2012
Abbreviated titleKDD 2012
Country/TerritoryChina
CityBeijing
Period12/08/1216/08/12
Internet address

Keywords

  • cftp
  • frequent sets
  • local patterns
  • sampling

Cite this