Bastion6

A bioinformatics approach for accurate prediction of type VI secreted effectors

Jiawei Wang, Bingjiao Yang, Andre Leier, Tatiana Marquez-Lago, Morihiro Hayashida, Andrea Rocker, Yanju Zhang, Tatsuya Akutsu, Kuo-Chen Chou, Richard A Strugnell, Jiangning Song, Trevor Lithgow

Research output: Contribution to journalArticleResearchpeer-review

30 Citations (Scopus)

Abstract

Motivation: Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results: We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs.

Original languageEnglish
Pages (from-to)2546-2555
Number of pages10
JournalBioinformatics
Volume34
Issue number15
DOIs
Publication statusPublished - 1 Jan 2018

Cite this

Wang, Jiawei ; Yang, Bingjiao ; Leier, Andre ; Marquez-Lago, Tatiana ; Hayashida, Morihiro ; Rocker, Andrea ; Zhang, Yanju ; Akutsu, Tatsuya ; Chou, Kuo-Chen ; Strugnell, Richard A ; Song, Jiangning ; Lithgow, Trevor. / Bastion6 : A bioinformatics approach for accurate prediction of type VI secreted effectors. In: Bioinformatics. 2018 ; Vol. 34, No. 15. pp. 2546-2555.
@article{a877631bbbed480d9e67426da4172c7e,
title = "Bastion6: A bioinformatics approach for accurate prediction of type VI secreted effectors",
abstract = "Motivation: Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results: We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs.",
author = "Jiawei Wang and Bingjiao Yang and Andre Leier and Tatiana Marquez-Lago and Morihiro Hayashida and Andrea Rocker and Yanju Zhang and Tatsuya Akutsu and Kuo-Chen Chou and Strugnell, {Richard A} and Jiangning Song and Trevor Lithgow",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/bty155",
language = "English",
volume = "34",
pages = "2546--2555",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press, USA",
number = "15",

}

Wang, J, Yang, B, Leier, A, Marquez-Lago, T, Hayashida, M, Rocker, A, Zhang, Y, Akutsu, T, Chou, K-C, Strugnell, RA, Song, J & Lithgow, T 2018, 'Bastion6: A bioinformatics approach for accurate prediction of type VI secreted effectors', Bioinformatics, vol. 34, no. 15, pp. 2546-2555. https://doi.org/10.1093/bioinformatics/bty155

Bastion6 : A bioinformatics approach for accurate prediction of type VI secreted effectors. / Wang, Jiawei; Yang, Bingjiao; Leier, Andre; Marquez-Lago, Tatiana; Hayashida, Morihiro; Rocker, Andrea; Zhang, Yanju; Akutsu, Tatsuya; Chou, Kuo-Chen; Strugnell, Richard A; Song, Jiangning; Lithgow, Trevor.

In: Bioinformatics, Vol. 34, No. 15, 01.01.2018, p. 2546-2555.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Bastion6

T2 - A bioinformatics approach for accurate prediction of type VI secreted effectors

AU - Wang, Jiawei

AU - Yang, Bingjiao

AU - Leier, Andre

AU - Marquez-Lago, Tatiana

AU - Hayashida, Morihiro

AU - Rocker, Andrea

AU - Zhang, Yanju

AU - Akutsu, Tatsuya

AU - Chou, Kuo-Chen

AU - Strugnell, Richard A

AU - Song, Jiangning

AU - Lithgow, Trevor

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Motivation: Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results: We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs.

AB - Motivation: Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results: We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs.

UR - http://www.scopus.com/inward/record.url?scp=85052947243&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty155

DO - 10.1093/bioinformatics/bty155

M3 - Article

VL - 34

SP - 2546

EP - 2555

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 15

ER -