Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI

Yi An, Jiawei Wang, Chen Li, Andre Leier, Tatiana Marquez-Lago, Jonathan Wilksch, Yang Zhang, Geoffrey I. Webb, Jiangning Song, Trevor Lithgow

Research output: Contribution to journalArticleResearchpeer-review

16 Citations (Scopus)

Abstract

Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperformall the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp.We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.

Original languageEnglish
Article numberbbw100
Pages (from-to)148-161
Number of pages14
JournalBriefings in Bioinformatics
Volume19
Issue number1
DOIs
Publication statusPublished - 1 Jan 2018

Keywords

  • Bacterial secretion system
  • Effector protein
  • Logistic regression
  • Random forest
  • Support vector machine

Cite this

@article{0a8246d17bf14c1b8f10af73729216de,
title = "Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI",
abstract = "Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperformall the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp.We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.",
keywords = "Bacterial secretion system, Effector protein, Logistic regression, Random forest, Support vector machine",
author = "Yi An and Jiawei Wang and Chen Li and Andre Leier and Tatiana Marquez-Lago and Jonathan Wilksch and Yang Zhang and Webb, {Geoffrey I.} and Jiangning Song and Trevor Lithgow",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/bib/bbw100",
language = "English",
volume = "19",
pages = "148--161",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford Univ Press",
number = "1",

}

Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI. / An, Yi; Wang, Jiawei; Li, Chen; Leier, Andre; Marquez-Lago, Tatiana; Wilksch, Jonathan; Zhang, Yang; Webb, Geoffrey I.; Song, Jiangning; Lithgow, Trevor.

In: Briefings in Bioinformatics, Vol. 19, No. 1, bbw100, 01.01.2018, p. 148-161.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI

AU - An, Yi

AU - Wang, Jiawei

AU - Li, Chen

AU - Leier, Andre

AU - Marquez-Lago, Tatiana

AU - Wilksch, Jonathan

AU - Zhang, Yang

AU - Webb, Geoffrey I.

AU - Song, Jiangning

AU - Lithgow, Trevor

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperformall the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp.We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.

AB - Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperformall the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp.We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.

KW - Bacterial secretion system

KW - Effector protein

KW - Logistic regression

KW - Random forest

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=85041205555&partnerID=8YFLogxK

U2 - 10.1093/bib/bbw100

DO - 10.1093/bib/bbw100

M3 - Article

VL - 19

SP - 148

EP - 161

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 1

M1 - bbw100

ER -