How next-generation sequencing is opening the door for drug discovery
The Wellcome Trust Sanger Institute’s Kim Judge explains how Next Generation Sequencing forms a crucial part of the scientist’s toolkit and makes a valuable contribution to the field of drug discovery…
Next generation sequencing offers unparalleled genomic resolution, allowing users to discriminate between single bases of the genetic code. It can be generated at ever increasing speed and ever decreasing costs. By no means a saviour – able to answer any and all questions – it nevertheless plays a role in the generation of data to be mined. Today, it forms a crucial part of the scientist’s toolkit and makes a valuable contribution to the field of drug discovery.
Different next-generation sequencing technologies have different strengths and weaknesses. Some next-generation sequencing technologies, such as 454 Life Sciences owned by Roche, are no longer commercially available. Others are still in production, such as Life Technologies’ SOLiD platform and Ion Torrent’s semiconductor sequencing, but are not as widely used as Illumina, the current market leader.
Illumina makes technology that generates large amounts of short-read data, which is highly accurate. One strength of Illumina’s technology is the ability to multiplex, or ‘barcode’, each DNA sample allowing many samples to be sequenced simultaneously on one of their machines. This enables large studies involving thousands of genomes – whether human, model organism or pathogen – to be carried out. An example of this is the 100,000 Genomes Project funded by NHS England, which aims to sequence genomic information from participants with some common types of cancer and rare diseases. Sequence data, together with information about the patient’s current condition and medical history, can be studied both to aid the patient and also help academia and industry alike better understand the causes of the conditions, or cancer types, and develop novel drugs to target these conditions.
A further benefit of Illumina’s technology is the range of platforms it manufactures. Ranging from the MiniSeq, producing just a few gigabases of data, to the HiSeq X Ten, a suite of 10 sequencers able to produce 18,000 human genomes per year, Illumina has set out to offer a sequencer to suit every laboratory situation. The different platforms have different strengths and weaknesses – considering the two above, the HiSeq X10 is able to produce the cheapest human genomes, breaking the so-called ‘$1,000 genome barrier’ for the first time. However, it is less economical when run substantially below capacity, and therefore is suited to large centres with many thousands of samples to process.
Conversely, the low throughput of the MiniSeq means it is not suited to laboratories wishing to carry out whole human genome sequencing. However, the machine automates many of the steps required to process DNA into a ‘library’ before sequencing, making it suited to a laboratory where scientists do not have extensive experience with next-generation sequencing.
Additional strategies have been developed to complement Illumina’s technology, such as the synthetic long-read technology developed by 10x Genomics. In this technology, single molecules of high molecular weight DNA are ‘captured’ inside an oil droplet, before fragmentation and labelling with a unique ‘identifier’ DNA molecule. The individual fragments are sequenced as short reads on an Illumina platform, before being recombined computationally to reconstitute the large DNA fragment from the short-read data. This enables users to obtain the benefits of Illumina’s highly accurate short reads, yet also place them in context by combining the short reads to create ‘long-read’ information. This has the potential for detecting structural variants within a genome, and also enables phasing of haplotypes.
There are two comparatively new technologies making inroads on the next-generation sequencing market, developed by Pacific Biosciences and Oxford Nanopore. Both technologies generate longer ‘reads’ of DNA than Illumina; where Illumina reads up to 300 bases, Pacific Biosciences’ DNA sequencing technology can read tens of kilobases of sequence in a single read. Although both companies produce instruments with a smaller output of data than Illumina’s highest yielding instruments, Pacific Biosciences expects to increase its throughput over the coming months. Like Pacific Biosciences, Oxford Nanopore’s MinION sequencer can also produce long reads of DNA, but has the additional benefit of being highly portable – a similar size to a mobile phone, it is run by plugging it into a laptop or desktop computer. Oxford Nanopore is in the process of developing and releasing two more instruments; the mid-sized GridION and the high-throughput PromethION.
A key use of next-generation sequencing for drug discovery is the generation of large datasets, which can be mined for the identification of novel targets. For example, this may include the identification of potential targets for novel antimicrobials when sequencing collections of bacterial isolates. At the Wellcome Trust Sanger Institute, many thousands of bacteria have been sequenced using both Illumina and Pacific Biosciences technology, creating databases that can be used by bioinformaticians to understand genes that are shared between all bacteria of a species (core genes) and to unpick the genes and genotypes linked to antimicrobial resistance phenotypes. A benefit of using Illumina sequencing for this is that it allows the rapid generation of a large number of bacterial genomes from a single sequencing run. Long-read Pacific Biosciences data has been used to create highly contiguous, or ‘complete’, assemblies, often enabling the accumulation of a single bacterial chromosomal contig, where Illumina data enables a more fractured, incomplete ‘draft’ assembly.
Additionally, next generation sequencing is not limited to whole genome sequencing. Targeted sequencing, including whole exomes through to panels of genes of interest, can be a cost-effective way of generating data. Focusing only on the genes of interest has the potential to miss upstream effects; however, it is cheaper, as less data is required and it is potentially easier to manage and store data, as less data is generated with targeted sequencing than with whole genomes.
A further use of next-generation sequencing is the sequencing of RNA, typically through conversion to cDNA, although Oxford Nanopore is in the process of developing direct RNA sequencing. A benefit of long-read sequencing, whether Pacific Biosciences or Oxford Nanopore, is that it can also be used to profile the relative abundance of different RNA transcripts within a cell or tissue.
Next-generation sequencing can also be used to support the later stages of drug discovery, such as clinical trials. The Oxford Nanopore MinION has the advantage that it could be directly taken to the patient, even when the patient is in a remote, resource-limited location. A further potential benefit is the automated sample preparation systems under development by Oxford Nanopore, such as its VoITRAX machine. Subject to regulatory approval for clinical use, this would enable preparation of DNA ‘libraries’ for whole-genome sequencing outside the laboratory environment.
Additionally, the data generated by the MinION is accessible in real-time – meaning that within a few minutes of beginning a sequencing run, data is available to be analysed. This has potential advantages for a patient as it facilitates a rapid sample-to-answer solution, speeding up tailored prescribing. However, it has advantages for the research scientist too – not only in speed but also in enabling the precise amount of data required to be generated. Researchers can monitor data generation and stop a run once sufficient sequence data has been generated to answer the research question. The Illumina MiSeq also has a comparatively short run time, and would likely be suited to clinical situations, given that it has FDA approval for clinical sequencing.
An attractive aspect of next-generation sequencing is that it lends itself to in silico and in vitro studies, supporting scientists in the aim to replace, reduce and refine the use of animals in experimental procedures. Further, next-generation sequencing has the potential to enable further exploration of people’s genomes. Here, informed consent must be sought, prior to collection of tissue for sequencing. This may seem trivial, especially where the tissue required can be collected through a routine procedure, such as a blood sample. However, DNA sequencing can lead to both predictable findings, ie, genes linked to the study in question, and unrelated findings, such as a gene linked to late-onset diseases, or carrier status of disease. It can also have wider implications than for a person’s healthcare alone, with the potential to discern paternity or adoption status. Human DNA sequence, and inferences made from it, must be either carefully anonymised, stored under rigorous security, or released back to the donor through a considered process by a trained individual such as a genetic counsellor. A benefit of studies that use targeted sequencing is the reduced risk of unrelated findings as a side effect of the research questions.
An exciting new chapter
Next-generation sequencing looks set to begin an exciting new chapter in the field of drug discovery, but with caveats. The sequencing process itself must be bookended by other processes, thoughtfully planned, to obtain maximum value from the sequence data. Immediately apparent is the need for good quality, robust DNA extraction protocols, tailored to the organism or tissue being studied. A favourite catchphrase within the sequencing community is ‘rubbish in, rubbish out’ (or a variant with more informal language). Essentially, it is not possible to extract reliable results from poor starting DNA.
A second point is the data analysis. Within a wider skills shortage in STEM subjects, the bioinformatics skill shortage is perceived to be particularly acute. Tangentially, it underlines the role we all have to play in reaching out to the next generation of potential scientists, to educate and inspire through schools and outreach work. Finally, a key factor often overlooked is experimental design. While useful studies have arisen from a ‘sequence first, question later’ approach, investing time in considering which samples to sequence, to what depth of coverage, and using what sequencing technology, will likely pay off in the long run. Additionally, selecting an adequate number of samples to obtain statistically significant results is necessary.
Also featured in our latest NGS In-Depth Focus:
NGS: empowering infectious disease research beyond reality
Pushpanathan Muthuirulan (National Institutes of Health), Pooja Sharma (Catholic University of America)
Kim Judge joined Illumina as a Research Associate in sequencing R&D, where she worked on the MiSeq, NextSeq500 and Nextera. She moved to the Department of Medicine at Addenbrooke’s Hospital, Cambridge in 2012, where her PhD focused on using the Oxford Nanopore MinION for microbiological applications including detecting antimicrobial resistance and identifying plasmids. She now works with the MinION, GridION and PromethION in the sequencing R&D team at the Wellcome Trust Sanger Institute.