Discovering functional dependencies from mixed-type data

Panagiotis Mandros, David Kaltenpoth, Mario Boley, Jilles Vreeken

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Given complex data collections, practitioners can perform non-parametric functional dependency discovery (FDD) to uncover relationships between variables that were previously unknown. However, known FDD methods are applicable to nominal data, and in practice non-nominal variables are discretized, e.g., in a pre-processing step. This is problematic because, as soon as a mix of discrete and continuous variables is involved, the interaction of discretization with the various dependency measures from the literature is poorly understood. In particular, it is unclear whether a given discretization method even leads to a consistent dependency estimate. In this paper, we analyze these fundamental questions and derive formal criteria as to when a discretization process applied to a mixed set of random variables leads to consistent estimates of mutual information. With these insights, we derive an estimator framework applicable to any task that involves estimating mutual information from multivariate and mixed-type data. Last, we extend with this framework a previously proposed FDD approach for reliable dependencies. Experimental evaluation shows that the derived reliable estimator is both computationally and statistically efficient, and leads to effective FDD algorithms for mixed-type data.

Original languageEnglish
Title of host publicationProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
EditorsJiliang Tang, B. Aditya Prakash
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages1404-1414
Number of pages11
ISBN (Electronic)9781450379984
DOIs
Publication statusPublished - 2020
EventACM International Conference on Knowledge Discovery and Data Mining 2020 - Virtual, Online, United States of America
Duration: 23 Aug 202027 Aug 2020
Conference number: 26th
https://dl.acm.org/doi/proceedings/10.1145/3394486 (Proceedings)
https://www.kdd.org/kdd2020/ (Website)

Conference

ConferenceACM International Conference on Knowledge Discovery and Data Mining 2020
Abbreviated titleKDD 2020
Country/TerritoryUnited States of America
CityVirtual, Online
Period23/08/2027/08/20
Internet address

Keywords

  • functional dependency discovery
  • mixed data
  • mutual information

Cite this