Complete DNA sequence of the X chromosome revealed

Researchers capitalised on novel sequencing technologies to produce the first end-to-end DNA sequence of the X chromosome.

x chromosome surrounded by DNA strands

Researchers have produced the first telomere to telomere (end-to-end) DNA sequence of a human chromosome. The team generated the sequence of an X chromosome base-by-base and now intend to produce a complete sequence of each chromosome in the human genome.

“This accomplishment begins a new era in genomics research,” said Dr Eric Green, director of the National Human Genome Research Institute (NHGRI), part of the US National Institutes of Health (NIH). “The ability to generate truly complete sequences of chromosomes and genomes is a technical feat that will help us gain a comprehensive understanding of genome function and inform the use of genomic information in medical care.”

Despite two decades of improvements since its release, the reference sequence for the human genome still contains hundreds of gaps or missing DNA sequences. These gaps often contain repetitive DNA segments, which researchers say are exceptionally difficult to sequence; however, are important parts of the puzzle, as they include genes and other functional elements that may be relevant to human health and disease.

The human genome is thought to consist of 6 million bases, making it too long for DNA sequencing machines to read all the bases at once. Instead, researchers must divide the genome into smaller sections, analyse each individually and then reassemble the shorter DNA sequences to create the full one.

Senior author of the X chromosome paper, Dr Adam Phillippy, at the NHGRI compared this issue to solving a puzzle: “Imagine having to reconstruct a jigsaw puzzle. If you are working with smaller pieces, each contains less context for figuring out where it came from, especially in parts of the puzzle without any unique clues, like a blue sky. The same is true for sequencing the human genome. Until now, the pieces were too small, and there was no way to put the hardest parts of the genome puzzle together.”

The team decided to sequence the whole of the X chromosome first, as it is linked with a myriad of conditions, including haemophilia, chronic granulomatous disease and Duchenne muscular dystrophy. In this study published in Nature, the researchers did not sequence the X chromosome from a normal human cell. Instead, they used a special cell type – one that has two identical X chromosomes, where typical female cells have two different X chromosomes. Such a cell provides more DNA for sequencing than a male cell, which has only a single copy of an X chromosome and avoids sequence differences encountered when analysing two X chromosomes of a typical female cell.

Capitalising on new technologies that are able to sequence long segments of DNA, the scientists were able to leave the DNA molecules largely intact. These large DNA molecules were then analysed by two different techniques, including nanopore sequencing. Each of them generates very long DNA sequences – something previous instruments could not accomplish.

The team also used their newly developed computer programme to assemble the sequences they generated.

Dr Karen Miga, University of California – Santa Cruz, US, led the effort to close the largest remaining sequence gap on the X chromosome, the roughly 3 million bases of repetitive DNA that comprises the centromere. While she said there was no ‘gold standard’ to arrange these highly repetitive DNA sequences accurately, Miga and colleagues performed several validation steps to help increase the legitimacy of the sequence.

The new human genome sequence, derived from a human cell line called CHM13, closes many gaps in the current reference genome, known as Genome Reference Consortium build 38 (GRCh38).

This first sequence is part of a broader initiative by the Telomere-to-Telomere (T2T) consortium, partially funded by NHGRI, which hopes to generate a complete reference sequence of the human genome in 2020. The consortium us continuing its efforts with other chromosomes; however, stated that several possible challenges remain, such as chromosomes 1 and 9 which have repetitive DNA segments that are much larger than the ones encountered on the X chromosome.

“We know these previously uncharted sites in our genome are very different among individuals, but it is important to start figuring out how these differences contribute to human biology and disease,” Miga said. Both Phillippy and Miga agree that enhancing sequencing methods will continue to create new opportunities in human genetics and genomics.

The study was published in Nature.