Projects per year
Abstract
Determining the pathogenicity and functional impact (i.e. gain-of-function; GOF or loss-of-function; LOF) of a variant is vital for unraveling the genetic level mechanisms of human diseases. To provide a 'one-stop' framework for the accurate identification of pathogenicity and functional impact of variants, we developed a two-stage deep-learning-based computational solution, termed VPatho, which was trained using a total of 9619 pathogenic GOF/LOF and 138 026 neutral variants curated from various databases. A total number of 138 variant-level, 262 protein-level and 103 genome-level features were extracted for constructing the models of VPatho. The development of VPatho consists of two stages: (i) a random under-sampling multi-scale residual neural network (ResNet) with a newly defined weighted-loss function (RUS-Wg-MSResNet) was proposed to predict variants' pathogenicity on the gnomAD_NV + GOF/LOF dataset; and (ii) an XGBOD model was constructed to predict the functional impact of the given variants. Benchmarking experiments demonstrated that RUS-Wg-MSResNet achieved the highest prediction performance with the weights calculated based on the ratios of neutral versus pathogenic variants. Independent tests showed that both RUS-Wg-MSResNet and XGBOD achieved outstanding performance. Moreover, assessed using variants from the CAGI6 competition, RUS-Wg-MSResNet achieved superior performance compared to state-of-the-art predictors. The fine-trained XGBOD models were further used to blind test the whole LOF data downloaded from gnomAD and accordingly, we identified 31 nonLOF variants that were previously labeled as LOF/uncertain variants. As an implementation of the developed approach, a webserver of VPatho is made publicly available at http://csbio.njust.edu.cn/bioinf/vpatho/ to facilitate community-wide efforts for profiling and prioritizing the query variants with respect to their pathogenicity and functional impact.
Original language | English |
---|---|
Pages (from-to) | 1-16 |
Number of pages | 16 |
Journal | Briefings in Bioinformatics |
Volume | 24 |
Issue number | 1 |
DOIs | |
Publication status | Published - 19 Jan 2023 |
Keywords
- 1D-ResNet
- 2D-ResNet
- gnomAD variants
- pathogenic GOF/LOF
- random under-sampling
- weighted-loss function
Projects
- 2 Finished
-
Stochastic modelling of telomere length regulation in ageing research
Australian Research Council (ARC), Monash University
3/01/12 → 30/10/17
Project: Research
-
Characterisation of plant cysteine proteases with therapeutic potential
Pike, R., Song, J., Whisstock, J. & Mynott, T.
Australian Research Council (ARC), Sarantis Limited
1/07/11 → 30/06/14
Project: Research