Large-scale comparative assessment of computational predictors for lysine post-translational modification sites

Zhen Chen, Xuhan Liu, Fuyi Li, Chen Li, Tatiana Marquez-Lago, Andre Leier, Tatsuya Akutsu, Geoffrey I. Webb, Dakang Xu, Alexander Ian Smith, Lei Li, Kuo-Chen Chou, Jiangning Song

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu.ezproxy.lib.monash.edu.au/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Original languageEnglish
Article numberbby089
Number of pages24
JournalBriefings in Bioinformatics
DOIs
Publication statusAccepted/In press - 4 Oct 2018

Keywords

  • lysine post-translational modification
  • prediction model
  • sequence features
  • feature engineering
  • deep learning

Cite this

@article{6ea88203c6d94cd5bad1b4df8accb48a,
title = "Large-scale comparative assessment of computational predictors for lysine post-translational modification sites",
abstract = "Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu.ezproxy.lib.monash.edu.au/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.",
keywords = "lysine post-translational modification, prediction model, sequence features, feature engineering, deep learning",
author = "Zhen Chen and Xuhan Liu and Fuyi Li and Chen Li and Tatiana Marquez-Lago and Andre Leier and Tatsuya Akutsu and Webb, {Geoffrey I.} and Dakang Xu and Smith, {Alexander Ian} and Lei Li and Kuo-Chen Chou and Jiangning Song",
year = "2018",
month = "10",
day = "4",
doi = "10.1093/bib/bby089",
language = "English",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford Univ Press",

}

Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. / Chen, Zhen; Liu, Xuhan; Li, Fuyi; Li, Chen; Marquez-Lago, Tatiana; Leier, Andre; Akutsu, Tatsuya; Webb, Geoffrey I.; Xu, Dakang; Smith, Alexander Ian; Li, Lei; Chou, Kuo-Chen; Song, Jiangning.

In: Briefings in Bioinformatics, 04.10.2018.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Large-scale comparative assessment of computational predictors for lysine post-translational modification sites

AU - Chen, Zhen

AU - Liu, Xuhan

AU - Li, Fuyi

AU - Li, Chen

AU - Marquez-Lago, Tatiana

AU - Leier, Andre

AU - Akutsu, Tatsuya

AU - Webb, Geoffrey I.

AU - Xu, Dakang

AU - Smith, Alexander Ian

AU - Li, Lei

AU - Chou, Kuo-Chen

AU - Song, Jiangning

PY - 2018/10/4

Y1 - 2018/10/4

N2 - Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu.ezproxy.lib.monash.edu.au/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.

AB - Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu.ezproxy.lib.monash.edu.au/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.

KW - lysine post-translational modification

KW - prediction model

KW - sequence features

KW - feature engineering

KW - deep learning

U2 - 10.1093/bib/bby089

DO - 10.1093/bib/bby089

M3 - Article

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

M1 - bby089

ER -