PhosTransfer: a deep transfer learning framework for kinase-specific phosphorylation site prediction in hierarchy

Ying Xu, Campbell Wilson, André Leier, Tatiana T. Marquez-Lago, James Whisstock, Jiangning Song

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

8 Citations (Scopus)

Abstract

Machine learning algorithms have been widely used for predicting kinase-specific phosphorylation sites. However, the scarcity of training data for specific kinases makes it difficult to train effective models for predicting their phosphorylation sites. In this paper, we propose a deep transfer learning framework, PhosTransfer, for improving kinase-specific phosphorylation site prediction. It banks on the hierarchical information encoded in the kinase classification tree (KCT) which involves four levels: kinase groups, families, subfamilies and protein kinases (PKs). With PhosTransfer, predictive models associated with tree nodes at higher levels, which are trained with more sufficient training data, can be transferred and reused as feature extractors for predictive models of tree nodes at a lower level. Out results indicate that models with deep transfer learning out-performed those without transfer learning for 73 out of 79 tested PKs. The positive effect of deep transfer learning is better demonstrated in the prediction of phosphosites for kinase nodes with less training data. These improved performances are further validated and explained by the visualisation of vector representations generated from hidden layers pre-trained at different KCT levels.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication24th Pacific-Asia Conference, PAKDD 2020 Singapore, May 11–14, 2020 Proceedings, Part II
EditorsHady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, Sinno Jialin Pan
Place of PublicationCham Switzerland
PublisherSpringer
Pages384-395
Number of pages12
ISBN (Electronic)9783030474362
ISBN (Print)9783030474355
DOIs
Publication statusPublished - 2020
EventPacific-Asia Conference on Knowledge Discovery and Data Mining 2020 - Singapore, Singapore
Duration: 11 May 202014 May 2020
Conference number: 24th
https://pakdd2020.org (Website)
https://link.springer.com/book/10.1007/978-3-030-47426-3 (Proceedings)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12085
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 2020
Abbreviated titlePAKDD 2020
Country/TerritorySingapore
CitySingapore
Period11/05/2014/05/20
Internet address

Keywords

  • Hierarchical representation
  • Phosphorylation site prediction
  • Transfer learning

Cite this