Integrating genomic data from different ancestries reduces bias in predicting disease risk

Researchers have developed a promising new tool that accurately uses genomic data to predict disease risk across diverse populations.

Big genomic data visualization. DNA test, genome map.

Polygenic risk scores (PRS) are promising tools for predicting disease risk, but current versions have built-in bias that can affect their accuracy in some populations and result in health disparities. Researchers from Massachusetts General Hospital (MGH), the Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University, all US, and Shanghai Jiao Tong University, China, have designed a new method for generating PRS that integrates genomic data from different ancestries, therefore, more accurately predict disease risk across populations. The new study was recently published in Nature Genetics.

Alterations in a gene’s DNA sequence can produce a genetic variant that increases the risk for disease. Some genetic variants are closely linked to certain diseases, such as the BRCA1 mutation and breast cancer. However, most common human diseases are influenced by hundreds or thousands of genetic variants across the genome. PRS aggregate the effects of genetic variants across the genome and have shown promise for one day being used to predict individual patients’ chances of developing diseases. This would allow clinicians to recommend preventive measures and monitor patients closely for early diagnosis and intervention.

A PRS must be trained to predict disease risk using data from studies in which genomic information is collected from large groups of individuals. While many disease-causing variants are shared there are important differences in the genetic basis of a disease between individuals of different ancestries.

“A major problem with existing methods for PRS calculation is that, to date, most of the genomic studies used data collected from individuals of European ancestry,” said Dr Tian Ge, a co-senior author of the study. This creates a Eurocentric bias in existing PRS, producing substantially less-accurate predictions and raising the possibility that they could over- or underestimate disease risk in non-European populations.

Recently, researchers have increased efforts to collect genomic data from underrepresented populations. Leveraging these resources, the team created a new tool called PRS-CSx that can integrate data from multiple populations and account for genetic similarities and differences between them. While there is s still significantly more genomic data on individuals of European ancestry, the investigators used computational methods that allowed them to maximise the value of non-European data and improve prediction accuracy in ancestrally diverse individuals.

In the study, the investigators used genomic data from individuals in several different populations to predict a wide range of physical measures (such as height, body mass index, and blood pressure), blood biomarkers (such as glucose and cholesterol), and the risk for schizophrenia. Then they compared the predicted trait or disease risk with actual measures or reported disease status to measure PRS-CSx’s prediction accuracy. The study’s results demonstrated that PRS-CSx is significantly more accurate than existing PRS tools in non-European populations.

ARTICLE: Closing the diversity gap in genomics

PRS-CSx could also have a role in basic research. It could be used, for example, to explore gene-environment interactions, such as how the effect of genetic risk would depend on the level of environmental risk factors in global populations.

Even with PRS-CSx, the gap in prediction accuracy between European and non-European populations remains considerable. Broadening the sample diversity across global populations is crucial to further improve the prediction accuracy of PRS in diverse populations.

“The expansion of non-European genomic resources, coupled with advanced analytic methods like PRS-CSx, will accelerate the equitable deployment of PRS in clinical settings,” concluded Dr Hailiang Huang, a co-senior author of the paper.