The multidrug-resistant Gram-negative bacteria has evolved into a worldwide threat to human health; over recent decades, polymyxins have re-emerged in clinical practice due to their high activity against multidrug-resistant bacteria. Nevertheless, the nephrotoxicity and neurotoxicity of polymyxins seriously hinder their practical use in the clinic. Based on the quantitative structure-activity relationship (QSAR), analogue design is an efficient strategy for discovering biologically active compounds with fewer adverse effects. To accelerate the polymyxin analogues discovery process and find the polymyxin analogues with high antimicrobial activity against Gram-negative bacteria, here we developed PmxPred, a GCN and catBoost-based machine learning framework. The RDKit descriptors were used for the molecule and residues representation, and the ensemble learning model was utilized for the antimicrobial activity prediction. This framework was trained and evaluated on multiple Gram-negative bacteria datasets, including Acinetobacter baumannii, Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa and a general Gram-negative bacteria dataset achieving an AUROC of 0.857, 0.880, 0.756, 0.895 and 0.865 on the independent test, respectively. PmxPred outperformed the transfer learning method that trained on 10 million molecules. We interpreted our model well-trained model by analysing the importance of global and residue features. Overall, PmxPred provides a powerful additional tool for predicting active polymyxin analogues, and holds the potential elucidate the mechanisms underlying the antimicrobial activity of polymyxins. The source code is publicly available on GitHub (https://github.com/yanwu20/PmxPred).

Original languageEnglish
Article number107681
Number of pages10
JournalComputers in Biology and Medicine
Publication statusPublished - 2024


  • Bioinformatics
  • Deep learning
  • Feature engineering
  • Machine learning
  • Polymyxin analogues
  • Predictors

Cite this