TY - JOUR
T1 - A partition-based feature selection method for mixed data: a filter approach
AU - Dutt, Ashish
AU - Ismail, Maizatul Akmar
N1 - Funding Information:
We would like to thank Dr. Rashmi Gangwar (Consultant) at Water Sanitation and Hygiene (WASH) program at UNICEF, who helped us in acquiring the DISE dataset. This work was supported by the University of Malaya research under Grant GPF006D-2019.
Publisher Copyright:
© Faculty of Computer Science and Information Technology.
PY - 2020
Y1 - 2020
N2 - Feature selection is fundamentally an optimization problem for selecting relevant features from several alternatives in clustering problems. Though several algorithms have been suggested, however till this day, there has not been any one of those that has been dubbed as the best for every problem scenario. Therefore, researchers continue to strive in developing superior algorithms. Even though clustering process is considered a pre-processing task but what it really does is just dividing the data into groups. In this paper we have attempted an improved distance function to cluster mixed data. A similarity measure for mixed data is Gower distance is adopted and modified to define the similarity between object pairs. A partitional algorithm for mixed data is employed to group similar objects in clusters. The performance of the proposed method has been evaluated on similar mixed and real educational dataset in terms of the silhouette coefficient. Results reveal the effectiveness of this algorithm in unsupervised discovery problems. The proposed algorithm performed better than other clustering algorithms for various datasets.
AB - Feature selection is fundamentally an optimization problem for selecting relevant features from several alternatives in clustering problems. Though several algorithms have been suggested, however till this day, there has not been any one of those that has been dubbed as the best for every problem scenario. Therefore, researchers continue to strive in developing superior algorithms. Even though clustering process is considered a pre-processing task but what it really does is just dividing the data into groups. In this paper we have attempted an improved distance function to cluster mixed data. A similarity measure for mixed data is Gower distance is adopted and modified to define the similarity between object pairs. A partitional algorithm for mixed data is employed to group similar objects in clusters. The performance of the proposed method has been evaluated on similar mixed and real educational dataset in terms of the silhouette coefficient. Results reveal the effectiveness of this algorithm in unsupervised discovery problems. The proposed algorithm performed better than other clustering algorithms for various datasets.
KW - Clustering
KW - Educational data mining
KW - Mixed data
KW - Unsupervised feature selection
UR - http://www.scopus.com/inward/record.url?scp=85090829719&partnerID=8YFLogxK
U2 - 10.22452/mjcs.vol33no2.5
DO - 10.22452/mjcs.vol33no2.5
M3 - Article
AN - SCOPUS:85090829719
SN - 0127-9084
VL - 33
SP - 152
EP - 169
JO - Malaysian Journal of Computer Science
JF - Malaysian Journal of Computer Science
IS - 2
ER -