A Deep Learning-Based Method for Identification of Bacteriophage-Host Interaction

Menglu Li, Yanan Wang, Fuyi Li, Yun Zhao, Mengya Liu, Sijia Zhang, Yannan Bin, A. Ian Smith, Geoffrey I. Webb, Jian Li, Jiangning Song, Junfeng Xia

Research output: Contribution to journalArticleResearchpeer-review

11 Citations (Scopus)


Multi-drug resistance (MDR) has become one of the greatest threats to human health worldwide, and novel treatment methods of infections caused by MDR bacteria are urgently needed. Phage therapy is a promising alternative to solve this problem, to which the key is correctly matching target pathogenic bacteria with the corresponding therapeutic phage. Deep learning is powerful for mining complex patterns to generate accurate predictions. In this study, we develop PredPHI (Predicting Phage-Host Interactions), a deep learning-based tool capable of predicting the host of phages from sequence data. We collect >3000 phage-host pairs along with their protein sequences from PhagesDB and GenBank databases and extract a set of features. Then we select high-quality negative samples based on the K-Means clustering method and construct a balanced training set. Finally, we employ a deep convolutional neural network to build the predictive model. The results indicate that PredPHI can achieve a predictive performance of 81 percent in terms of the area under the receiver operating characteristic curve on the test set, and the clustering-based method is significantly more robust than that based on randomly selecting negative samples. These results highlight that PredPHI is a useful and accurate tool for identifying phage-host interactions from sequence data.

Original languageEnglish
Pages (from-to)1801-1810
Number of pages10
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Issue number5
Publication statusPublished - Sep 2021


  • bioinformatics
  • deep learning
  • multi-drug resistance
  • pattern recognition
  • Phage-host interaction
  • sequence analysis

Cite this