Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques

Mihiretu Kebede, Desalegn Tigabu, Berihun Megabiaw Zeleke

Research output: Contribution to journalArticleResearchpeer-review

12 Citations (Scopus)


Background and objectives To monitor the progress of therapy and disease progression, periodic CD4 counts are required throughout the course of HIV/AIDS care and support. The demand for CD4 count measurement is increasing as ART programs expand over the last decade. This study aimed to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART. Methods A cross-sectional study was conducted at the University of Gondar Hospital from 3,104 adult patients on ART with CD4 counts measured at least twice (baseline and most recent). Data were retrieved from the HIV care clinic electronic database and patients` charts. Descriptive data were analyzed by SPSS version 20. Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was followed to undertake the study. WEKA version 3.8 was used to conduct a predictive data mining. Before building the predictive data mining models, information gain values and correlation-based Feature Selection methods were used for attribute selection. Variables were ranked according to their relevance based on their information gain values. J48, Neural Network, and Random Forest algorithms were experimented to assess model accuracies. Result The median duration of ART was 191.5 weeks. The mean CD4 count change was 243 (SD 191.14) cells per microliter. Overall, 2427 (78.2%) patients had their CD4 counts increased by at least 100 cells per microliter, while 4% had a decline from the baseline CD4 value. Baseline variables including age, educational status, CD8 count, ART regimen, and hemoglobin levels predicted CD4 count changes with predictive accuracies of J48, Neural Network, and Random Forest being 87.1%, 83.5%, and 99.8%, respectively. Random Forest algorithm had a superior performance accuracy level than both J48 and Artificial Neural Network. The precision, sensitivity and recall values of Random Forest were also more than 99%. Conclusions Nearly accurate prediction results were obtained using Random Forest algorithm. This algorithm could be used in a low-resource setting to build a web-based prediction model for CD4 count changes.

Original languageEnglish
Pages (from-to)149-157
Number of pages9
JournalComputer Methods and Programs in Biomedicine
Publication statusPublished - 1 Dec 2017


  • Antiretroviral treatment
  • CD4 count change
  • Computational methods
  • J48, Decision tree
  • Neural Network
  • Random Forest

Cite this