Missing data imputation using decision trees and fuzzy clustering with iterative learning

Sanaz Nikfalazar, Chung-Hsing Yeh, Susan Bedingfield, Hadi A. Khorshidi

Research output: Contribution to journalArticleResearchpeer-review

64 Citations (Scopus)

Abstract

Various imputation approaches have been proposed to address the issue of missing values in data mining and machine learning applications. To improve the accuracy of missing data imputation, this paper proposes a new method called DIFC by integrating the merits of decision tress and fuzzy clustering into an iterative learning approach. To compare the performance of the DIFC method against five effective imputation methods, extensive experiments are conducted on six widely used datasets with numerical and categorical missing data, and with various amounts and types of missing values. The experimental results show that the DIFC method outperforms other methods in terms of imputation accuracy. Further experiments on the effect of missing value types demonstrate the robustness of the DIFC method in dealing with different types of missing values. This paper contributes to missing data imputation research by providing an accurate and robust method.

Original languageEnglish
Pages (from-to)2419-2437
Number of pages19
JournalKnowledge and Information Systems
Volume62
DOIs
Publication statusPublished - 2020

Keywords

  • Data mining
  • Decision trees
  • Fuzzy clustering
  • Missing data imputation

Cite this