Web-Based Risk Prediction Tool for an Individual's Risk of HIV and Sexually Transmitted Infections Using Machine Learning Algorithms: Development and External Validation Study

Xianglong Xu, Zhen Yu, Zongyuan Ge, Eric P.F. Chow, Yining Bao, Jason J. Ong, Wei Li, Jinrong Wu, Christopher K. Fairley, Lei Zhang

Research output: Contribution to journalArticleResearchpeer-review

8 Citations (Scopus)


Background: HIV and sexually transmitted infections (STIs) are major global public health concerns. Over 1 million curable STIs occur every day among people aged 15 years to 49 years worldwide. Insufficient testing or screening substantially impedes the elimination of HIV and STI transmission. Objective: The aim of our study was to develop an HIV and STI risk prediction tool using machine learning algorithms. Methods: We used clinic consultations that tested for HIV and STIs at the Melbourne Sexual Health Centre between March 2, 2015, and December 31, 2018, as the development data set (training and testing data set). We also used 2 external validation data sets, including data from 2019 as external "validation data 1" and data from January 2020 and January 2021 as external "validation data 2." We developed 34 machine learning models to assess the risk of acquiring HIV, syphilis, gonorrhea, and chlamydia. We created an online tool to generate an individual's risk of HIV or an STI. Results: The important predictors for HIV and STI risk were gender, age, men who reported having sex with men, number of casual sexual partners, and condom use. Our machine learning-based risk prediction tool, named MySTIRisk, performed at an acceptable or excellent level on testing data sets (area under the curve [AUC] for HIV=0.78; AUC for syphilis=0.84; AUC for gonorrhea=0.78; AUC for chlamydia=0.70) and had stable performance on both external validation data from 2019 (AUC for HIV=0.79; AUC for syphilis=0.85; AUC for gonorrhea=0.81; AUC for chlamydia=0.69) and data from 2020-2021 (AUC for HIV=0.71; AUC for syphilis=0.84; AUC for gonorrhea=0.79; AUC for chlamydia=0.69). Conclusions: Our web-based risk prediction tool could accurately predict the risk of HIV and STIs for clinic attendees using simple self-reported questions. MySTIRisk could serve as an HIV and STI screening tool on clinic websites or digital health platforms to encourage individuals at risk of HIV or an STI to be tested or start HIV pre-exposure prophylaxis. The public can use this tool to assess their risk and then decide if they would attend a clinic for testing. Clinicians or public health workers can use this tool to identify high-risk individuals for further interventions.

Original languageEnglish
Article numbere37850
Number of pages12
JournalJournal of Medical Internet Research
Issue number8
Publication statusPublished - 1 Aug 2022


  • algorithm
  • chlamydia
  • development
  • gonorrhea
  • HIV
  • machine learning
  • model
  • prediction
  • predictive
  • risk
  • risk assessment
  • sexual health
  • sexual transmission
  • sexually transmitted
  • sexually transmitted infections
  • syphilis
  • validation
  • web-based

Cite this