Optimized cardinality-based generalized itemset mining using transaction ID and numeric encoding

Bac H. Le, Phuc Luong

Research output: Contribution to journalArticleResearchpeer-review

Abstract

In recent years, generalization-based data mining techniques have become an interesting topic for many data scientists. Generalized itemset mining is an exploration technique that focuses on extracting high-level abstractions and correlations in a database. However, the problem that domain experts must always deal with is how to manage and interpret a large number of extracted patterns from a massive database of transactions. In generalized pattern mining, taxonomies that contain abstraction information for each dataset are defined, so the number of frequent patterns can grow enormously. Therefore, exploiting knowledge turns into a difficult and costly process. In this article, we introduce an approach that uses cardinality-based constraints with transaction id and numeric encoding to mine generalized patterns. We applied transaction id to support the computation of each frequent itemset as well as to encode taxonomies into a numeric type using two simple rules. We also attempted to apply the combination of cardinality cons- traints and closed or maximal patterns. Experiments show that our optimizations significantly improve the performance of the original method, and the importance of comprehensive information within closed and maximal patterns is worth considering in generalized frequent pattern mining.

Original languageEnglish
Pages (from-to)2067-2080
Number of pages14
JournalApplied Intelligence
Volume48
Issue number8
DOIs
Publication statusPublished - Aug 2018
Externally publishedYes

Keywords

  • Cardinality constraints
  • Closed itemset
  • Generalized itemset
  • Maximal itemset
  • Optimization

Cite this