Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling

Yan Zhu, Cangzhi Jia, Fuyi Li, Jiangning Song

Research output: Contribution to journalArticleResearchpeer-review

53 Citations (Scopus)

Abstract

Lysine succinylation is an important type of protein post-translational modification and plays a key role in regulating protein function and structural changes. The mechanism and function of succinylation have not been clarified. The key to better understanding the precise mechanism and functional role of succinylation is the identification of lysine succinylation sites. However, conventional experimental methods for succinylation identification are often expensive, time-consuming, and labor-intensive. Therefore, the new development of computational approaches to effectively identify lysine succinylation sites from sequence data is much needed. In this study, we proposed a novel predictor for lysine succinylation identification, Inspector, which was developed by using the random forest algorithm combined with a variety of sequence-based feature-encoding schemes. Edited nearest-neighbor undersampling method and adaptive synthetic oversampling approach were employed to solve dataset imbalance, and a two-step feature-selection strategy was applied to optimize the feature set for training the accuracy of the prediction model. Empirical studies on performance comparison with existing tools showed that Inspector was able to achieve competitive predictive performance for distinguishing lysine succinylation sites.

Original languageEnglish
Article number113592
Number of pages10
JournalAnalytical Biochemistry
Volume593
DOIs
Publication statusPublished - 15 Mar 2020

Keywords

  • Adaptive synthetic oversampling
  • Edited nearest-neighbor undersampling
  • Random forest

Cite this