TY - JOUR
T1 - Multiethnic polygenic risk scores improve risk prediction in diverse populations
AU - Márquez-Luna, Carla
AU - Loh, Po Ru
AU - Kooner, Jaspal S.
AU - Saleheen, Danish
AU - Sim, Xueling
AU - Sehmi, Joban
AU - Zhang, Weihua
AU - Frossard, Philippe
AU - Been, Latonya F.
AU - Chia, Kee Seng
AU - Dimas, Antigone S.
AU - Hassanali, Neelam
AU - Jafar, Tazeen
AU - Jowett, Jeremy B.M.
AU - Li, Xinzhing
AU - Radha, Venkatesan
AU - Rees, Simon D.
AU - Takeuchi, Fumihiko
AU - Young, Robin
AU - Aung, Tin
AU - Basit, Abdul
AU - Chidambaram, Manickam
AU - Das, Debashish
AU - Grunberg, Elin
AU - Hedman, Asa K.
AU - Hydrie, Zafar I.
AU - Islam, Muhammed
AU - Khor, Chiea Chuen
AU - Kowlessur, Sudhir
AU - Kristensen, Malene M.
AU - Liju, Samuel
AU - Lim, Wei Yen
AU - Matthews, David R.
AU - Liu, Jianjun
AU - Morris, Andrew P.
AU - Nica, Alexandra C.
AU - Pinidiyapathirage, Janani M.
AU - Prokopenko, Inga
AU - Rasheed, Asif
AU - Samuel, Maria
AU - Shah, Nabi
AU - Shera, A. Samad
AU - Small, Kerrin S.
AU - Suo, Chen
AU - Wickremasinghe, Ananda R.
AU - Wong, Tien Yin
AU - Yang, Mingyu
AU - Abecasis, Goncalo R.
AU - Barnett, Anthony H.
AU - Caulfield, Mark
AU - Deloukas, Panos
AU - Frayling, Tim
AU - Froguel, Philippe
AU - Kato, Norihiro
AU - Katulanda, Prasad
AU - Kelly, M. Ann
AU - Mohan, Viswanathan
AU - Sanghera, Dharambir K.
AU - Scott, James
AU - Seielstad, Mark
AU - Zimmet, Paul Z.
AU - Elliott, Paul
AU - Teo, Yik Ying
AU - McCarthy, Mark I.
AU - Danesh, John
AU - Tai, E. Shyong
AU - Chambers, John C.
AU - Williams, Amy L.
AU - Jacobs, Suzanne B.R.
AU - MorenoMacías, Hortensia
AU - Huerta-Chagoya, Alicia
AU - Churchouse, Claire
AU - García-Ortíz, Humberto
AU - GómezVázquez, María José
AU - Ripke, Stephan
AU - Manning, Alisa K.
AU - Neale, Benjamin
AU - Reich, David
AU - Stram, Daniel O.
AU - Fernández-López, Juan Carlos
AU - Patterson, Nick
AU - Churchhouse, Claire
AU - Gopal, Shuba
AU - Grammatikos, James A.
AU - Smith, Ian C.
AU - Bullock, Kevin H.
AU - Deik, Amy A.
AU - Souza, Amanda L.
AU - Pierce, Kerry A.
AU - Clish, Clary B.
AU - Martínez-Hernández, Angélica
AU - Barajas-Olmos, Francisco
AU - Centeno-Cruz, Federico
AU - MendozaCaamal, Elvia
AU - Contreras-Cubas, Cecilia
AU - Revilla-Monsalve, Cristina
AU - Islas-Andrade, Sergio
AU - Córdova, Emilio
AU - Soberón, Xavier
AU - González-Villalpando, María Elena
AU - Henderson, Brian E.
AU - Monroe, Kristine
AU - Wilkens, Lynne
AU - Kolonel, Laurence N.
AU - Le Marchand, Loic
AU - Riba, Laura
AU - OrdóñezSánchez, María Luisa
AU - Rodríguez-Guillén, Rosario
AU - Cruz-Bautista, Ivette
AU - Rodríguez-Torres, Maribel
AU - Muñoz-Hernández, Linda Liliana
AU - Gómez, Donají
AU - Alvirde, Ulises
AU - Arellano, Olimpia
AU - Onofrio, Robert C.
AU - Brodeur, Wendy M.
AU - Gage, Diane
AU - Murphy, Jacquelyn
AU - Franklin, Jennifer
AU - Mahan, Scott
AU - Ardlie, Kristin
AU - Crenshaw, Andrew T.
AU - Winckler, Wendy
AU - Cortes, Maria L.
AU - Burtt, Noël P.
AU - Aguilar-Salinas, Carlos A.
AU - González-Villalpando, Clicerio
AU - Florez, Jose C.
AU - Orozco, Lorena
AU - Haiman, Christopher A.
AU - Tusié-Luna, Teresa
AU - Altshuler, David
AU - South Asian Type 2 Diabetes (SAT2D) Consortium
AU - The SIGMA Type 2 Diabetes Consortium
AU - Price, Alkes L.
PY - 2017/12
Y1 - 2017/12
N2 - Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size (Neff= 40k) and Latino training data in small sample size (Neff= 8k). Here, we attained a >70% relative improvement in prediction accuracy (from R2= 0.027 to 0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. We predict T2D in a South Asian UK Biobank cohort using European (Neff= 40k) and South Asian (Neff= 16k) training data and attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort using European (N = 113k) and African (N = 2k) training data attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.
AB - Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size (Neff= 40k) and Latino training data in small sample size (Neff= 8k). Here, we attained a >70% relative improvement in prediction accuracy (from R2= 0.027 to 0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. We predict T2D in a South Asian UK Biobank cohort using European (Neff= 40k) and South Asian (Neff= 16k) training data and attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort using European (N = 113k) and African (N = 2k) training data attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.
KW - genome-wide association study
KW - height
KW - polygenic prediction
KW - type 2 diabetes
UR - http://www.scopus.com/inward/record.url?scp=85034867548&partnerID=8YFLogxK
U2 - 10.1002/gepi.22083
DO - 10.1002/gepi.22083
M3 - Article
C2 - 29110330
AN - SCOPUS:85034867548
SN - 0741-0395
VL - 41
SP - 811
EP - 823
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 8
ER -