TY - JOUR
T1 - Validating functional redundancy with mixed generative adversarial networks
AU - Nguyen, Thanh Tam
AU - Huynh, Thanh Trung
AU - Pham, Minh Tam
AU - Hoang, Thanh Dat
AU - Nguyen, Thanh Thi
AU - Nguyen, Quoc Viet Hung
N1 - Funding Information:
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2019.323 .
Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/3/15
Y1 - 2023/3/15
N2 - Data redundancy has been one of the most important problems in data-intensive applications such as data mining and machine learning. Removing data redundancy brings many benefits in efficient data updating, effective data storage, and error-free query processing. While it has been studied for four decades, existing works on data redundancy mostly focus on syntactic formulations such as normal forms and functional dependencies, which lead to intractable discovery problems. In this work, we propose a new concept, namely functional redundancy, that overcomes the limitations of functional dependencies, especially on continuous data. We design and develop efficient algorithms based on generative adversarial networks to validate any functional redundancy without heavily depending on the number of attributes and the number of tuples like functional dependencies. The core idea is to use the imputation power of generative adversarial networks to model any semantic dependencies between attributes. Extensive experiments on different real-world and synthetic datasets show that our approach outperforms representative baselines, is applicable for first-order and high-order dependencies, and is extensible for different types of data.
AB - Data redundancy has been one of the most important problems in data-intensive applications such as data mining and machine learning. Removing data redundancy brings many benefits in efficient data updating, effective data storage, and error-free query processing. While it has been studied for four decades, existing works on data redundancy mostly focus on syntactic formulations such as normal forms and functional dependencies, which lead to intractable discovery problems. In this work, we propose a new concept, namely functional redundancy, that overcomes the limitations of functional dependencies, especially on continuous data. We design and develop efficient algorithms based on generative adversarial networks to validate any functional redundancy without heavily depending on the number of attributes and the number of tuples like functional dependencies. The core idea is to use the imputation power of generative adversarial networks to model any semantic dependencies between attributes. Extensive experiments on different real-world and synthetic datasets show that our approach outperforms representative baselines, is applicable for first-order and high-order dependencies, and is extensible for different types of data.
KW - Data imputation
KW - Data management
KW - Functional dependency
KW - Functional redundancy
KW - Generative adversarial networks
KW - Mixed data types
UR - http://www.scopus.com/inward/record.url?scp=85147550166&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2023.110342
DO - 10.1016/j.knosys.2023.110342
M3 - Article
AN - SCOPUS:85147550166
SN - 0950-7051
VL - 264
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110342
ER -