SMG: self-supervised masked graph learning for cancer gene identification

Yan Cui, Zhikang Wang, Xiaoyu Wang, Yiwen Zhang, Ying Zhang, Tong Pan, Zhe Zhang, Shanshan Li, Yuming Guo, Tatsuya Akutsu, Jiangning Song

Research output: Contribution to journalArticleResearchpeer-review

9 Citations (Scopus)

Abstract

Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.

Original languageEnglish
Article numberbbad406
Number of pages13
JournalBriefings in Bioinformatics
Volume24
Issue number6
DOIs
Publication statusPublished - Nov 2023

Keywords

  • cancer genes
  • graph learning
  • protein-protein interaction network
  • representation learning
  • self-supervised learning

Cite this