Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features

Yuanyuan Liu, Mingjun Wang, Huilin Wang, Hao Tan, Ziding Zhang, Geoffrey I Webb, Jiangning Song

Research output: Contribution to journalArticleResearchpeer-review

47 Citations (Scopus)

Abstract

Lysine acetylation is a reversible post-translational modification, playing an important role in cytokine signaling, transcriptional regulation, and apoptosis. To fully understand acetylation mechanisms, identification of substrates and specific acetylation sites is crucial. Experimental identification is often time-consuming and expensive. Alternative bioinformatics methods are cost-effective and can be used in a high-throughput manner to generate relatively precise predictions. Here we develop a method termed as SSPKA for species-specific lysine acetylation prediction, using random forest classifiers that combine sequence-derived and functional features with two-step feature selection. Feature importance analysis indicates functional features, applied for lysine acetylation site prediction for the first time, significantly improve the predictive performance. We apply the SSPKA model to screen the entire human proteome and identify many high-confidence putative substrates that are not previously identified. The results along with the implemented Java tool, serve as useful resources to elucidate the mechanism of lysine acetylation and facilitate hypothesis-driven experimental design and validation.
Original languageEnglish
Pages (from-to)1 - 12
Number of pages12
JournalScientific Reports
Volume4
Issue number(Art. No.: 5765)
DOIs
Publication statusPublished - 2014

Cite this

Liu, Yuanyuan ; Wang, Mingjun ; Wang, Huilin ; Tan, Hao ; Zhang, Ziding ; Webb, Geoffrey I ; Song, Jiangning. / Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features. In: Scientific Reports. 2014 ; Vol. 4, No. (Art. No.: 5765). pp. 1 - 12.
@article{31c9857cd00f49289b7304b4a060c8c3,
title = "Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features",
abstract = "Lysine acetylation is a reversible post-translational modification, playing an important role in cytokine signaling, transcriptional regulation, and apoptosis. To fully understand acetylation mechanisms, identification of substrates and specific acetylation sites is crucial. Experimental identification is often time-consuming and expensive. Alternative bioinformatics methods are cost-effective and can be used in a high-throughput manner to generate relatively precise predictions. Here we develop a method termed as SSPKA for species-specific lysine acetylation prediction, using random forest classifiers that combine sequence-derived and functional features with two-step feature selection. Feature importance analysis indicates functional features, applied for lysine acetylation site prediction for the first time, significantly improve the predictive performance. We apply the SSPKA model to screen the entire human proteome and identify many high-confidence putative substrates that are not previously identified. The results along with the implemented Java tool, serve as useful resources to elucidate the mechanism of lysine acetylation and facilitate hypothesis-driven experimental design and validation.",
author = "Yuanyuan Liu and Mingjun Wang and Huilin Wang and Hao Tan and Ziding Zhang and Webb, {Geoffrey I} and Jiangning Song",
year = "2014",
doi = "10.1038/srep05765",
language = "English",
volume = "4",
pages = "1 -- 12",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "(Art. No.: 5765)",

}

Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features. / Liu, Yuanyuan; Wang, Mingjun; Wang, Huilin; Tan, Hao; Zhang, Ziding; Webb, Geoffrey I; Song, Jiangning.

In: Scientific Reports, Vol. 4, No. (Art. No.: 5765), 2014, p. 1 - 12.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features

AU - Liu, Yuanyuan

AU - Wang, Mingjun

AU - Wang, Huilin

AU - Tan, Hao

AU - Zhang, Ziding

AU - Webb, Geoffrey I

AU - Song, Jiangning

PY - 2014

Y1 - 2014

N2 - Lysine acetylation is a reversible post-translational modification, playing an important role in cytokine signaling, transcriptional regulation, and apoptosis. To fully understand acetylation mechanisms, identification of substrates and specific acetylation sites is crucial. Experimental identification is often time-consuming and expensive. Alternative bioinformatics methods are cost-effective and can be used in a high-throughput manner to generate relatively precise predictions. Here we develop a method termed as SSPKA for species-specific lysine acetylation prediction, using random forest classifiers that combine sequence-derived and functional features with two-step feature selection. Feature importance analysis indicates functional features, applied for lysine acetylation site prediction for the first time, significantly improve the predictive performance. We apply the SSPKA model to screen the entire human proteome and identify many high-confidence putative substrates that are not previously identified. The results along with the implemented Java tool, serve as useful resources to elucidate the mechanism of lysine acetylation and facilitate hypothesis-driven experimental design and validation.

AB - Lysine acetylation is a reversible post-translational modification, playing an important role in cytokine signaling, transcriptional regulation, and apoptosis. To fully understand acetylation mechanisms, identification of substrates and specific acetylation sites is crucial. Experimental identification is often time-consuming and expensive. Alternative bioinformatics methods are cost-effective and can be used in a high-throughput manner to generate relatively precise predictions. Here we develop a method termed as SSPKA for species-specific lysine acetylation prediction, using random forest classifiers that combine sequence-derived and functional features with two-step feature selection. Feature importance analysis indicates functional features, applied for lysine acetylation site prediction for the first time, significantly improve the predictive performance. We apply the SSPKA model to screen the entire human proteome and identify many high-confidence putative substrates that are not previously identified. The results along with the implemented Java tool, serve as useful resources to elucidate the mechanism of lysine acetylation and facilitate hypothesis-driven experimental design and validation.

UR - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4104576/pdf/srep05765.pdf

U2 - 10.1038/srep05765

DO - 10.1038/srep05765

M3 - Article

VL - 4

SP - 1

EP - 12

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - (Art. No.: 5765)

ER -