In-stream frequent itemset mining with output proportional memory footprint

Daniel Trabold, Mario Boley, Michael Mock, Tamas Horváth

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch


We propose an online partial counting algorithm based on statistical inference that approximates itemset frequencies from data streams. The space complexity of our algorithm is proportional to the number of frequent itemsets in the stream at any time. Furthermore, the longer an itemset is frequent the closer is the approximation to its frequency, implying that the results become more precise as the stream evolves. We empirically compare our approach in terms of correctness and memory footprint to CARMA and Lossy Counting. Though our algorithm outperforms only CARMA in correctness, it requires much less space than both of these algorithms providing an alternative to Lossy Counting when the memory available is limited.

Original languageEnglish
Title of host publicationProceedings of the LWA 2015 Workshops: KDML, FGWM, IR, FGDB
Subtitle of host publicationTrier, Germany, October 7-9, 2015.
EditorsRalph Bergmann, Sebastian Görg, Gilbert Müller
PublisherRheinisch-Westfaelische Technische Hochschule Aachen
Number of pages12
ISBN (Electronic)007414588
Publication statusPublished - 2015
Externally publishedYes
EventWorkshop on Knowledge Discovery, Data Mining and Machine Learning 2015
- Trier, Germany
Duration: 7 Oct 20159 Oct 2016

Publication series

NameCEUR Workshop Proceedings
PublisherRheinisch-Westfaelische Technische Hochschule Aachen * Lehrstuhl Informatik V
ISSN (Print)1613-0073


ConferenceWorkshop on Knowledge Discovery, Data Mining and Machine Learning 2015
Abbreviated titleKDML
Internet address

Cite this