Mining top-k minimal redundancy frequent patterns over uncertain databases

Haishuai Wang, Peng Zhang, Jia Wu, Shirui Pan

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Frequent pattern mining from uncertain data has been paid closed attention due to most of the real life databases contain data with uncertainty. Several approaches have been proposed for mining high significance frequent itemsets over uncertain data, however, previous algorithms yield many redundant frequent itemsets and require to set an appropriate user specified threshold which is difficult for users. In this paper, we formally define the problem of top-fc minimal redundancy probabilistic frequent pattern mining, which targets to identify top-fc patterns with high-significance and low-redundancy simultaneously from uncertain data. We first design uncertain pattern correlation based on Pearson correlation coefficient, which considers pattern uncertainty. Moreover, we present a new algorithm, UTFP, to mine top-fc minimal redundancy frequent patterns of length no less than minimum length mind without setting threshold. We further propose a set of strategies to prune and reduce search space. Experimental results demonstrate that the proposed algorithm achieves good performance in terms of finding top-fc frequent patterns with low redundancy on probabilistic data. Our method represents the first research endeavor for probabilistic data based top-fc correlated pattern mining.

Original languageEnglish
Title of host publicationNeural Information Processing
Subtitle of host publication22nd International Conference, ICONIP 2015 Istanbul, Turkey, November 9–12, 2015 Proceedings, Part IV
EditorsSabri Arik, Tingwen Huang, Weng Kin Lai, Qingshan Liu
Place of PublicationCham Switzerland
PublisherSpringer
Pages111-119
Number of pages9
ISBN (Electronic)9783319265612
ISBN (Print)9783319265605
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventInternational Conference on Neural Information Processing 2015 - Istanbul, Turkey
Duration: 9 Nov 201512 Nov 2015
Conference number: 22nd
https://web.archive.org/web/20151210114427/http://www.iconip2015.org/
https://link.springer.com/chapter/10.1007%2F978-3-319-26535-3_5

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume9492
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Neural Information Processing 2015
Abbreviated titleICONIP 2015
CountryTurkey
CityIstanbul
Period9/11/1512/11/15
Internet address

Keywords

  • Frequent patterns
  • Redundancy
  • Top-k
  • Uncertain

Cite this

Wang, H., Zhang, P., Wu, J., & Pan, S. (2015). Mining top-k minimal redundancy frequent patterns over uncertain databases. In S. Arik, T. Huang, W. K. Lai, & Q. Liu (Eds.), Neural Information Processing: 22nd International Conference, ICONIP 2015 Istanbul, Turkey, November 9–12, 2015 Proceedings, Part IV (pp. 111-119). (Lecture Notes in Computer Science; Vol. 9492). Cham Switzerland: Springer. https://doi.org/10.1007/978-3-319-26561-2_14
Wang, Haishuai ; Zhang, Peng ; Wu, Jia ; Pan, Shirui. / Mining top-k minimal redundancy frequent patterns over uncertain databases. Neural Information Processing: 22nd International Conference, ICONIP 2015 Istanbul, Turkey, November 9–12, 2015 Proceedings, Part IV. editor / Sabri Arik ; Tingwen Huang ; Weng Kin Lai ; Qingshan Liu. Cham Switzerland : Springer, 2015. pp. 111-119 (Lecture Notes in Computer Science).
@inproceedings{4c9fd5c1d8864c6cae7d819792215bc1,
title = "Mining top-k minimal redundancy frequent patterns over uncertain databases",
abstract = "Frequent pattern mining from uncertain data has been paid closed attention due to most of the real life databases contain data with uncertainty. Several approaches have been proposed for mining high significance frequent itemsets over uncertain data, however, previous algorithms yield many redundant frequent itemsets and require to set an appropriate user specified threshold which is difficult for users. In this paper, we formally define the problem of top-fc minimal redundancy probabilistic frequent pattern mining, which targets to identify top-fc patterns with high-significance and low-redundancy simultaneously from uncertain data. We first design uncertain pattern correlation based on Pearson correlation coefficient, which considers pattern uncertainty. Moreover, we present a new algorithm, UTFP, to mine top-fc minimal redundancy frequent patterns of length no less than minimum length mind without setting threshold. We further propose a set of strategies to prune and reduce search space. Experimental results demonstrate that the proposed algorithm achieves good performance in terms of finding top-fc frequent patterns with low redundancy on probabilistic data. Our method represents the first research endeavor for probabilistic data based top-fc correlated pattern mining.",
keywords = "Frequent patterns, Redundancy, Top-k, Uncertain",
author = "Haishuai Wang and Peng Zhang and Jia Wu and Shirui Pan",
year = "2015",
doi = "10.1007/978-3-319-26561-2_14",
language = "English",
isbn = "9783319265605",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "111--119",
editor = "Sabri Arik and Tingwen Huang and Lai, {Weng Kin} and Qingshan Liu",
booktitle = "Neural Information Processing",

}

Wang, H, Zhang, P, Wu, J & Pan, S 2015, Mining top-k minimal redundancy frequent patterns over uncertain databases. in S Arik, T Huang, WK Lai & Q Liu (eds), Neural Information Processing: 22nd International Conference, ICONIP 2015 Istanbul, Turkey, November 9–12, 2015 Proceedings, Part IV. Lecture Notes in Computer Science, vol. 9492, Springer, Cham Switzerland, pp. 111-119, International Conference on Neural Information Processing 2015, Istanbul, Turkey, 9/11/15. https://doi.org/10.1007/978-3-319-26561-2_14

Mining top-k minimal redundancy frequent patterns over uncertain databases. / Wang, Haishuai; Zhang, Peng; Wu, Jia; Pan, Shirui.

Neural Information Processing: 22nd International Conference, ICONIP 2015 Istanbul, Turkey, November 9–12, 2015 Proceedings, Part IV. ed. / Sabri Arik; Tingwen Huang; Weng Kin Lai; Qingshan Liu. Cham Switzerland : Springer, 2015. p. 111-119 (Lecture Notes in Computer Science; Vol. 9492).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Mining top-k minimal redundancy frequent patterns over uncertain databases

AU - Wang, Haishuai

AU - Zhang, Peng

AU - Wu, Jia

AU - Pan, Shirui

PY - 2015

Y1 - 2015

N2 - Frequent pattern mining from uncertain data has been paid closed attention due to most of the real life databases contain data with uncertainty. Several approaches have been proposed for mining high significance frequent itemsets over uncertain data, however, previous algorithms yield many redundant frequent itemsets and require to set an appropriate user specified threshold which is difficult for users. In this paper, we formally define the problem of top-fc minimal redundancy probabilistic frequent pattern mining, which targets to identify top-fc patterns with high-significance and low-redundancy simultaneously from uncertain data. We first design uncertain pattern correlation based on Pearson correlation coefficient, which considers pattern uncertainty. Moreover, we present a new algorithm, UTFP, to mine top-fc minimal redundancy frequent patterns of length no less than minimum length mind without setting threshold. We further propose a set of strategies to prune and reduce search space. Experimental results demonstrate that the proposed algorithm achieves good performance in terms of finding top-fc frequent patterns with low redundancy on probabilistic data. Our method represents the first research endeavor for probabilistic data based top-fc correlated pattern mining.

AB - Frequent pattern mining from uncertain data has been paid closed attention due to most of the real life databases contain data with uncertainty. Several approaches have been proposed for mining high significance frequent itemsets over uncertain data, however, previous algorithms yield many redundant frequent itemsets and require to set an appropriate user specified threshold which is difficult for users. In this paper, we formally define the problem of top-fc minimal redundancy probabilistic frequent pattern mining, which targets to identify top-fc patterns with high-significance and low-redundancy simultaneously from uncertain data. We first design uncertain pattern correlation based on Pearson correlation coefficient, which considers pattern uncertainty. Moreover, we present a new algorithm, UTFP, to mine top-fc minimal redundancy frequent patterns of length no less than minimum length mind without setting threshold. We further propose a set of strategies to prune and reduce search space. Experimental results demonstrate that the proposed algorithm achieves good performance in terms of finding top-fc frequent patterns with low redundancy on probabilistic data. Our method represents the first research endeavor for probabilistic data based top-fc correlated pattern mining.

KW - Frequent patterns

KW - Redundancy

KW - Top-k

KW - Uncertain

UR - http://www.scopus.com/inward/record.url?scp=84951872417&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-26561-2_14

DO - 10.1007/978-3-319-26561-2_14

M3 - Conference Paper

SN - 9783319265605

T3 - Lecture Notes in Computer Science

SP - 111

EP - 119

BT - Neural Information Processing

A2 - Arik, Sabri

A2 - Huang, Tingwen

A2 - Lai, Weng Kin

A2 - Liu, Qingshan

PB - Springer

CY - Cham Switzerland

ER -

Wang H, Zhang P, Wu J, Pan S. Mining top-k minimal redundancy frequent patterns over uncertain databases. In Arik S, Huang T, Lai WK, Liu Q, editors, Neural Information Processing: 22nd International Conference, ICONIP 2015 Istanbul, Turkey, November 9–12, 2015 Proceedings, Part IV. Cham Switzerland: Springer. 2015. p. 111-119. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-319-26561-2_14