Providing concise database covers instantly by recursive tile sampling

Sandy Moens, Mario Boley, Bart Goethals

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

2 Citations (Scopus)

Abstract

Known pattern discovery algorithms for finding tilings (covers of 0/1-databases consisting of 1-rectangles) cannot be integrated in instant and interactive KD tools, because they do not satisfy at least one of two key requirements: a) to provide results within a short response time of only a few seconds and b) to return a concise set of patterns with only a few elements that nevertheless covers a large fraction of the input database. In this paper we present a novel randomized algorithm that works well under these requirements. It is based on the recursive application of a simple tile sample procedure that can be implemented efficiently using rejection sampling. While, as we analyse, the theoretical solution distribution can be weak in the worst case, the approach performs very well in practice and outperforms previous sampling as well as deterministic algorithms.

Original languageEnglish
Title of host publicationDiscovery Science
Subtitle of host publication17th International Conference, DS 2014 Bled, Slovenia, October 8-10, 2014 Proceedings
EditorsSašo Džeroski, Panče Panov, Dragi Kocev, Ljupčo Todorovski
Place of PublicationCham Switzerland
PublisherSpringer
Pages216-227
Number of pages12
ISBN (Electronic)9783319118123
ISBN (Print)9783319118116
DOIs
Publication statusPublished - 2014
Externally publishedYes
EventInternational Conference on Discovery Science 2014 - Bled, Slovenia
Duration: 8 Oct 201410 Oct 2014
Conference number: 17th
http://ds2014.ijs.si/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume8777
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Discovery Science 2014
Abbreviated titleDS 2014
Country/TerritorySlovenia
CityBled
Period8/10/1410/10/14
Internet address

Keywords

  • Instant Pattern Mining
  • Sampling Closed Itemsets
  • Tiling Databases

Cite this