Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI

Yi An, Jiawei Wang, Chen Li, Andre Leier, Tatiana Marquez-Lago, Jonathan Wilksch, Yang Zhang, Geoffrey I. Webb, Jiangning Song, Trevor Lithgow

Research output: Contribution to journalArticleResearchpeer-review

32 Citations (Scopus)

Abstract

Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperformall the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp.We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.

Original languageEnglish
Article numberbbw100
Pages (from-to)148-161
Number of pages14
JournalBriefings in Bioinformatics
Volume19
Issue number1
DOIs
Publication statusPublished - 1 Jan 2018

Keywords

  • Bacterial secretion system
  • Effector protein
  • Logistic regression
  • Random forest
  • Support vector machine

Cite this