Bastion3: A two-layer ensemble predictor of type III secreted effectors

Jiawei Wang, Jiahui Li, Bingjiao Yang, Ruopeng Xie, Tatiana T. Marquez-Lago, Andre Leier, Morihiro Hayashida, Tatsuya Akutsu, Yanju Zhang, Kuo-Chen Chou, Joel Selkrig, Tieli Zhou, Jiangning Song, Trevor Lithgow

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Motivation: Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-Terminus (or incorporating also the C-Terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. Results: In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-The-Art toolkit for T3SE prediction.

Original languageEnglish
Article numberbty914
Pages (from-to)2017-2028
Number of pages12
JournalBioinformatics
Volume35
Issue number12
DOIs
Publication statusPublished - 1 Jun 2019

Cite this

Wang, J., Li, J., Yang, B., Xie, R., Marquez-Lago, T. T., Leier, A., ... Lithgow, T. (2019). Bastion3: A two-layer ensemble predictor of type III secreted effectors. Bioinformatics, 35(12), 2017-2028. [bty914]. https://doi.org/10.1093/bioinformatics/bty914
Wang, Jiawei ; Li, Jiahui ; Yang, Bingjiao ; Xie, Ruopeng ; Marquez-Lago, Tatiana T. ; Leier, Andre ; Hayashida, Morihiro ; Akutsu, Tatsuya ; Zhang, Yanju ; Chou, Kuo-Chen ; Selkrig, Joel ; Zhou, Tieli ; Song, Jiangning ; Lithgow, Trevor . / Bastion3 : A two-layer ensemble predictor of type III secreted effectors. In: Bioinformatics. 2019 ; Vol. 35, No. 12. pp. 2017-2028.
@article{670dcde5803344fd9d46d5eb6820c59e,
title = "Bastion3: A two-layer ensemble predictor of type III secreted effectors",
abstract = "Motivation: Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-Terminus (or incorporating also the C-Terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. Results: In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6{\%} in ACC value, 5.7{\%} in F-value, 12.4{\%} in MCC value and 5.8{\%} in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-The-Art toolkit for T3SE prediction.",
author = "Jiawei Wang and Jiahui Li and Bingjiao Yang and Ruopeng Xie and Marquez-Lago, {Tatiana T.} and Andre Leier and Morihiro Hayashida and Tatsuya Akutsu and Yanju Zhang and Kuo-Chen Chou and Joel Selkrig and Tieli Zhou and Jiangning Song and Trevor Lithgow",
year = "2019",
month = "6",
day = "1",
doi = "10.1093/bioinformatics/bty914",
language = "English",
volume = "35",
pages = "2017--2028",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press, USA",
number = "12",

}

Wang, J, Li, J, Yang, B, Xie, R, Marquez-Lago, TT, Leier, A, Hayashida, M, Akutsu, T, Zhang, Y, Chou, K-C, Selkrig, J, Zhou, T, Song, J & Lithgow, T 2019, 'Bastion3: A two-layer ensemble predictor of type III secreted effectors' Bioinformatics, vol. 35, no. 12, bty914, pp. 2017-2028. https://doi.org/10.1093/bioinformatics/bty914

Bastion3 : A two-layer ensemble predictor of type III secreted effectors. / Wang, Jiawei; Li, Jiahui; Yang, Bingjiao; Xie, Ruopeng; Marquez-Lago, Tatiana T.; Leier, Andre; Hayashida, Morihiro; Akutsu, Tatsuya; Zhang, Yanju; Chou, Kuo-Chen; Selkrig, Joel; Zhou, Tieli; Song, Jiangning; Lithgow, Trevor .

In: Bioinformatics, Vol. 35, No. 12, bty914, 01.06.2019, p. 2017-2028.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Bastion3

T2 - A two-layer ensemble predictor of type III secreted effectors

AU - Wang, Jiawei

AU - Li, Jiahui

AU - Yang, Bingjiao

AU - Xie, Ruopeng

AU - Marquez-Lago, Tatiana T.

AU - Leier, Andre

AU - Hayashida, Morihiro

AU - Akutsu, Tatsuya

AU - Zhang, Yanju

AU - Chou, Kuo-Chen

AU - Selkrig, Joel

AU - Zhou, Tieli

AU - Song, Jiangning

AU - Lithgow, Trevor

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Motivation: Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-Terminus (or incorporating also the C-Terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. Results: In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-The-Art toolkit for T3SE prediction.

AB - Motivation: Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-Terminus (or incorporating also the C-Terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. Results: In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-The-Art toolkit for T3SE prediction.

UR - http://www.scopus.com/inward/record.url?scp=85068423111&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty914

DO - 10.1093/bioinformatics/bty914

M3 - Article

VL - 35

SP - 2017

EP - 2028

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

M1 - bty914

ER -

Wang J, Li J, Yang B, Xie R, Marquez-Lago TT, Leier A et al. Bastion3: A two-layer ensemble predictor of type III secreted effectors. Bioinformatics. 2019 Jun 1;35(12):2017-2028. bty914. https://doi.org/10.1093/bioinformatics/bty914