TY - JOUR
T1 - Geospatial Features Influencing the Formation of COVID-19 Clusters
AU - Haque, Radiah
AU - Ting, Choo Yee
AU - Ee, Yeo Keat
AU - Ng, Keng Hoong
AU - Shaaban, Mohamed Najib
AU - Pee, Chih Yang
AU - Wong, Lai Kuan
AU - Raja, Dhesi Baha
N1 - Funding Information:
This project is funded by the Ministry of Science, Technology & Innovation Malaysia under the MOSTI Combating COVID-19 Grant (CV1220M1022).
Publisher Copyright:
© 2022, Success Culture Press. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Machine Learning methods have been used to combat COVID-19 since the pandemic has started in year 2020. In this regard, most studies have focused on detecting and identifying the characteristics of SARS-CoV-2, especially via image processing. Some studies have applied machine learning for contact tracing to minimise the transmission of COVID-19 cases. Limited work has, however, reported on how geospatial features have an influence on the transmission of COVID-19 and formation of clusters at local scale. Therefore, this paper has aimed to study the importance of geospatial features that had resorted to COVID-19 cluster formation in Kuala Lumpur, Malaysia in year 2021. Several datasets were used in this work, which have included the address details of confirmed positive COVID-19 cases and the details of nearby residential areas and Points of Interest (POI) located within the federal territory of Kuala Lumpur. The datasets were pre-processed and transformed into an analytical dataset for conducting empirical investigations. Various feature selection methods were applied, including the Boruta Algorithm, Chi-square (Chi2) Test, Extra Trees Classifier (ETC), Recursive Feature Elimination (RFE) method, and Deep Learning Autoencoder (DLA). Detailed investigations on the top-n features were performed to elicit a set of optimal features. Subsequently, several machine learning models were trained using the optimal features, including Logistic Regression (LR), Random Forest Classifier (RFC), Naïve Bayes Classifier (NBC), and Extreme Gradient Boosting (XGBoost). It was revealed that Boruta produced the optimal number of features with n = 96, whereas RFC achieved the best prediction results compared to other classifiers, with around 95% accuracy. Consequently, the findings in this paper help to recognize the geospatial features that have impacts on the formation of COVID-19 and other infectious disease clusters at local scale.
AB - Machine Learning methods have been used to combat COVID-19 since the pandemic has started in year 2020. In this regard, most studies have focused on detecting and identifying the characteristics of SARS-CoV-2, especially via image processing. Some studies have applied machine learning for contact tracing to minimise the transmission of COVID-19 cases. Limited work has, however, reported on how geospatial features have an influence on the transmission of COVID-19 and formation of clusters at local scale. Therefore, this paper has aimed to study the importance of geospatial features that had resorted to COVID-19 cluster formation in Kuala Lumpur, Malaysia in year 2021. Several datasets were used in this work, which have included the address details of confirmed positive COVID-19 cases and the details of nearby residential areas and Points of Interest (POI) located within the federal territory of Kuala Lumpur. The datasets were pre-processed and transformed into an analytical dataset for conducting empirical investigations. Various feature selection methods were applied, including the Boruta Algorithm, Chi-square (Chi2) Test, Extra Trees Classifier (ETC), Recursive Feature Elimination (RFE) method, and Deep Learning Autoencoder (DLA). Detailed investigations on the top-n features were performed to elicit a set of optimal features. Subsequently, several machine learning models were trained using the optimal features, including Logistic Regression (LR), Random Forest Classifier (RFC), Naïve Bayes Classifier (NBC), and Extreme Gradient Boosting (XGBoost). It was revealed that Boruta produced the optimal number of features with n = 96, whereas RFC achieved the best prediction results compared to other classifiers, with around 95% accuracy. Consequently, the findings in this paper help to recognize the geospatial features that have impacts on the formation of COVID-19 and other infectious disease clusters at local scale.
KW - COVID-19 clusters
KW - feature importance
KW - geospatial analytics
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85141204869&partnerID=8YFLogxK
U2 - 10.33168/JSMS.2022.0501
DO - 10.33168/JSMS.2022.0501
M3 - Article
AN - SCOPUS:85141204869
SN - 1816-6075
VL - 12
SP - 1
EP - 20
JO - Journal of System and Management Sciences
JF - Journal of System and Management Sciences
IS - 5
ER -