SIMLIN: a bioinformatics tool for prediction of S-sulphenylation in the human proteome based on multi-stage ensemble-learning models

Xiaochuan Wang, Chen Li, Fuyi Li, Varun S. Sharma, Jiangning Song, Geoffrey I. Webb

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Background
S-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (−SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation.
Results
In this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods.
Conclusions
In summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.
Original languageEnglish
Article number602
Number of pages12
JournalBMC Bioinformatics
DOIs
Publication statusAccepted/In press - 21 Nov 2019

Keywords

  • protein post-translational modification
  • S-sulphenylation
  • bioinformatics software
  • machine learning
  • ensemble learning

Cite this

@article{d3bdbefa32364044a2e5aa5f6ad9937d,
title = "SIMLIN: a bioinformatics tool for prediction of S-sulphenylation in the human proteome based on multi-stage ensemble-learning models",
abstract = "BackgroundS-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (−SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation.ResultsIn this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0{\%} prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods.ConclusionsIn summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.",
keywords = "protein post-translational modification, S-sulphenylation, bioinformatics software, machine learning, ensemble learning",
author = "Xiaochuan Wang and Chen Li and Fuyi Li and Sharma, {Varun S.} and Jiangning Song and Webb, {Geoffrey I.}",
year = "2019",
month = "11",
day = "21",
doi = "10.1186/s12859-019-3178-6",
language = "English",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

SIMLIN : a bioinformatics tool for prediction of S-sulphenylation in the human proteome based on multi-stage ensemble-learning models. / Wang, Xiaochuan; Li, Chen; Li, Fuyi; Sharma, Varun S.; Song, Jiangning; Webb, Geoffrey I.

In: BMC Bioinformatics, 21.11.2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - SIMLIN

T2 - a bioinformatics tool for prediction of S-sulphenylation in the human proteome based on multi-stage ensemble-learning models

AU - Wang, Xiaochuan

AU - Li, Chen

AU - Li, Fuyi

AU - Sharma, Varun S.

AU - Song, Jiangning

AU - Webb, Geoffrey I.

PY - 2019/11/21

Y1 - 2019/11/21

N2 - BackgroundS-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (−SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation.ResultsIn this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods.ConclusionsIn summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.

AB - BackgroundS-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (−SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation.ResultsIn this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods.ConclusionsIn summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.

KW - protein post-translational modification

KW - S-sulphenylation

KW - bioinformatics software

KW - machine learning

KW - ensemble learning

U2 - 10.1186/s12859-019-3178-6

DO - 10.1186/s12859-019-3178-6

M3 - Article

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 602

ER -