Clustering of countries for COVID-19 cases based on disease prevalence, health systems and environmental indicators

Syeda Amna Rizvi, Muhammad Umair, Muhammad Aamir Cheema

Research output: Contribution to journalArticleResearchpeer-review

16 Citations (Scopus)


The coronavirus has a high basic reproduction number (R0) and has caused the global COVID-19 pandemic. Governments are implementing lockdowns that are leading to economic fallout in many countries. Policy makers can take better decisions if provided with the indicators connected with the disease spread. This study is aimed to cluster the countries using social, economic, health and environmental related metrics affecting the disease spread so as to implement the policies to control the widespread of disease. Thus, countries with similar factors can take proactive steps to fight against the pandemic. The data is acquired for 79 countries and 18 different feature variables (the factors that are associated with COVID-19 spread) are selected. Pearson Product Moment Correlation Analysis is performed between all the feature variables with cumulative death cases and cumulative confirmed cases individually to get an insight of relation of these factors with the spread of COVID-19. Unsupervised k-means algorithm is used and the feature set includes economic, environmental indicators and disease prevalence along with COVID-19 variables. The learning model is able to group the countries into 4 clusters on the basis of relation with all 18 feature variables. We also present an analysis of correlation between the selected feature variables, and COVID-19 confirmed cases and deaths. Prevalence of underlying diseases shows strong correlation with COVID-19 whereas environmental health indicators are weakly correlated with COVID-19. 

Original languageEnglish
Article number111240
Number of pages10
JournalChaos, Solitons and Fractals
Publication statusPublished - Oct 2021


  • Clustering methods
  • COVID-19
  • COVID-19 confirmed cases
  • COVID-19 death cases
  • Disease prevalence
  • K-Means
  • Pearson correlation
  • Second wave
  • Unsupervised learning

Cite this