Discovering reliable approximate functional dependencies

Panagiotis Mandros, Mario Boley, Jilles Vreeken

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

15 Citations (Scopus)

Abstract

Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we effi-ciently discover the optimal or «-approximate top-k dependencies» These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.

Original languageEnglish
Title of host publicationKDD'17 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Subtitle of host publicationAugust 13-17, 2017 Halifax, NS, Canada
EditorsRómer Rosales
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages355-363
Number of pages9
ISBN (Electronic)9781450348874
DOIs
Publication statusPublished - 2017
Externally publishedYes
EventACM International Conference on Knowledge Discovery and Data Mining 2017 - Halifax, Canada
Duration: 13 Aug 201717 Aug 2017
Conference number: 23rd
https://www.kdd.org/kdd2017/
https://dl.acm.org/doi/proceedings/10.1145/3097983

Conference

ConferenceACM International Conference on Knowledge Discovery and Data Mining 2017
Abbreviated titleKDD 2017
CountryCanada
CityHalifax
Period13/08/1717/08/17
Internet address

Keywords

  • Information theory
  • Pattern discovery

Cite this

Mandros, P., Boley, M., & Vreeken, J. (2017). Discovering reliable approximate functional dependencies. In R. Rosales (Ed.), KDD'17 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: August 13-17, 2017 Halifax, NS, Canada (pp. 355-363). Association for Computing Machinery (ACM). https://doi.org/10.1145/3097983.3098062