Abstract
Machine learning algorithms have been widely used for predicting kinase-specific phosphorylation sites. However, the scarcity of training data for specific kinases makes it difficult to train effective models for predicting their phosphorylation sites. In this paper, we propose a deep transfer learning framework, PhosTransfer, for improving kinase-specific phosphorylation site prediction. It banks on the hierarchical information encoded in the kinase classification tree (KCT) which involves four levels: kinase groups, families, subfamilies and protein kinases (PKs). With PhosTransfer, predictive models associated with tree nodes at higher levels, which are trained with more sufficient training data, can be transferred and reused as feature extractors for predictive models of tree nodes at a lower level. Out results indicate that models with deep transfer learning out-performed those without transfer learning for 73 out of 79 tested PKs. The positive effect of deep transfer learning is better demonstrated in the prediction of phosphosites for kinase nodes with less training data. These improved performances are further validated and explained by the visualisation of vector representations generated from hidden layers pre-trained at different KCT levels.
Original language | English |
---|---|
Title of host publication | Advances in Knowledge Discovery and Data Mining |
Subtitle of host publication | 24th Pacific-Asia Conference, PAKDD 2020 Singapore, May 11–14, 2020 Proceedings, Part II |
Editors | Hady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, Sinno Jialin Pan |
Place of Publication | Cham Switzerland |
Publisher | Springer |
Pages | 384-395 |
Number of pages | 12 |
ISBN (Electronic) | 9783030474362 |
ISBN (Print) | 9783030474355 |
DOIs | |
Publication status | Published - 2020 |
Event | Pacific-Asia Conference on Knowledge Discovery and Data Mining 2020 - Singapore, Singapore Duration: 11 May 2020 → 14 May 2020 Conference number: 24th https://pakdd2020.org (Website) https://link.springer.com/book/10.1007/978-3-030-47426-3 (Proceedings) |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 12085 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Pacific-Asia Conference on Knowledge Discovery and Data Mining 2020 |
---|---|
Abbreviated title | PAKDD 2020 |
Country/Territory | Singapore |
City | Singapore |
Period | 11/05/20 → 14/05/20 |
Internet address |
|
Keywords
- Hierarchical representation
- Phosphorylation site prediction
- Transfer learning