Pippin: A random forest-based method for identifying presynaptic and postsynaptic neurotoxins

Pengyu Li, He Zhang, Xuyang Zhao, Cangzhi Jia, Fuyi Li, Jiangning Song

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Presynaptic and postsynaptic neurotoxins are two types of neurotoxins from venomous animals and functionally important molecules in the neurosciences; however, their experimental characterization is difficult, time-consuming, and costly. Therefore, bioinformatics tools that can identify presynaptic and postsynaptic neurotoxins would be very useful for understanding their functions and mechanisms. In this study, we propose Pippin, a novel machine learning-based method that allows users to rapidly and accurately identify these two types of neurotoxins. Pippin was developed using the random forest (RF) algorithm and evaluated based on an up-to-date dataset. A variety of sequence and motif features were combined, and a two-step feature-selection algorithm was employed to characterize the optimal feature subset for presynaptic and postsynaptic neurotoxin prediction. Extensive benchmark tests illustrate that Pippin significantly improved predictive performance as compared with six other commonly used machine-learning algorithms, including the naïve Bayes classifier, Multinomial Naïve Bayes classifier (MNBC), AdaBoost, Bagging, K-nearest neighbors, and XGBoost. Additionally, we developed an online webserver for Pippin to facilitate public use. To the best of our knowledge, this is the first webserver for presynaptic and postsynaptic neurotoxin prediction.

Original languageEnglish
Article number2050008
Number of pages21
JournalJournal of Bioinformatics and Computational Biology
Volume18
Issue number2
DOIs
Publication statusPublished - 5 May 2020

Keywords

  • feature selection
  • machine learning
  • random forest
  • sequence analysis
  • Toxin prediction

Cite this