Geospatial Features Influencing the Formation of COVID-19 Clusters

Radiah Haque, Choo Yee Ting, Yeo Keat Ee, Keng Hoong Ng, Mohamed Najib Shaaban, Chih Yang Pee, Lai Kuan Wong, Dhesi Baha Raja

Research output: Contribution to journalArticleResearchpeer-review

4 Citations (Scopus)


Machine Learning methods have been used to combat COVID-19 since the pandemic has started in year 2020. In this regard, most studies have focused on detecting and identifying the characteristics of SARS-CoV-2, especially via image processing. Some studies have applied machine learning for contact tracing to minimise the transmission of COVID-19 cases. Limited work has, however, reported on how geospatial features have an influence on the transmission of COVID-19 and formation of clusters at local scale. Therefore, this paper has aimed to study the importance of geospatial features that had resorted to COVID-19 cluster formation in Kuala Lumpur, Malaysia in year 2021. Several datasets were used in this work, which have included the address details of confirmed positive COVID-19 cases and the details of nearby residential areas and Points of Interest (POI) located within the federal territory of Kuala Lumpur. The datasets were pre-processed and transformed into an analytical dataset for conducting empirical investigations. Various feature selection methods were applied, including the Boruta Algorithm, Chi-square (Chi2) Test, Extra Trees Classifier (ETC), Recursive Feature Elimination (RFE) method, and Deep Learning Autoencoder (DLA). Detailed investigations on the top-n features were performed to elicit a set of optimal features. Subsequently, several machine learning models were trained using the optimal features, including Logistic Regression (LR), Random Forest Classifier (RFC), Naïve Bayes Classifier (NBC), and Extreme Gradient Boosting (XGBoost). It was revealed that Boruta produced the optimal number of features with n = 96, whereas RFC achieved the best prediction results compared to other classifiers, with around 95% accuracy. Consequently, the findings in this paper help to recognize the geospatial features that have impacts on the formation of COVID-19 and other infectious disease clusters at local scale.

Original languageEnglish
Pages (from-to)1-20
Number of pages20
JournalJournal of System and Management Sciences
Issue number5
Publication statusPublished - 2022
Externally publishedYes


  • COVID-19 clusters
  • feature importance
  • geospatial analytics
  • machine learning

Cite this