Reference genome comparison finds exome variant discrepancies
Scientists have found differences in 206 genes between the GRCh38 (hg38) and GRCh37 (hg19) human reference genomes.
In a new study, researchers at the Human Genome Sequencing Center at Baylor College of Medicine, US, have identified variant discrepancies between two reference genome sequences, creating guidance for laboratories to take advantage of an improved human reference genome.
According to the team, in the two decades since the Human Genome Project mapped the entire human genome, improvements in technology have helped in developing updated reference genomes used for sequencing. However, while the GRCh38 (hg38) human reference genome was released more than seven years ago, the older GRCh37 (hg19) reference remains widely used by most research and clinical laboratories. It is between these two sequences that the differences were found.
“There is a big push to update genomic sequencing resources to use the hg38 reference because the belief is that hg38 is a significant improvement over hg19,” said Moez Dawood, co-first author of the study. “We wanted to identify the differences in sequencing readouts between the two references for labs that are still using hg19.”
The researchers analysed exome sequencing samples from more than 1,500 participants in the Baylor-Hopkins Center for Mendelian Genomics programme. They found 206 genes with discordant variants between hg19 and hg38, including eight genes implicated in Mendelian diseases and 53 associated with common disease phenotypes. They found 73 percent of the discordant variants were clustered within sections of the genome with known assembly problems that the researchers called DISCordant Reference Patches (DISCREPs).
“This study is not a theoretical comparison of the two references – we looked at exome data from study participants and examined the impact of using the updated reference on Mendelian genes and pathogenic variants,” said Dr Aniko Sabo, a senior author of the study. “We wanted to provide the list of 206 genes enriched with discordant variants and bring this issue to the attention of the labs working on these genes.”
“For variant interpretation in the 206 genes enriched for discordant variants, reference assembly differences should be accounted for in the analysis, especially when lifting over variant co-ordinates from one reference to the other,” said Dr He Li, co-first author of the study.
The researchers say that transitioning from using the hg19 reference to the hg38 reference takes significant time and resources. Through this large-scale study of sequencing data, the researchers aim to ease the burden on labs considering the transition. The study quantifies the benefits and drawbacks of the new reference and validates its utility in a lab setting.
“It is one thing to make a better reference. It is quite another to integrate it into useful practice,” said Dr Richard Gibbs, senior author of the genome study. “Some labs have been hesitant to use the new reference, but this study provides reassurance and guidance for those who are considering moving over.”
The findings were published in the American Journal of Human Genetics.