How next-generation sequencing is opening the door for drug discovery

Judge, Kim

How next-generation sequencing is opening the door for drug discovery

37

SHARES

Share via

Posted: 13 September 2017 | Kim Judge | 1 comment

The Wellcome Trust Sanger Institute’s Kim Judge explains how Next Generation Sequencing forms a crucial part of the scientist’s toolkit and makes a valuable contribution to the field of drug discovery…

Next generation sequencing offers unparalleled genomic resolution, allowing users to discriminate between single bases of the genetic code. It can be generated at ever increasing speed and ever decreasing costs. By no means a saviour – able to answer any and all questions – it nevertheless plays a role in the generation of data to be mined. Today, it forms a crucial part of the scientist’s toolkit and makes a valuable contribution to the field of drug discovery.

Different next-generation sequencing technologies have different strengths and weaknesses. Some next-generation sequencing technologies, such as 454 Life Sciences owned by Roche, are no longer commercially available. Others are still in production, such as Life Technologies’ SOLiD platform and Ion Torrent’s semiconductor sequencing, but are not as widely used as Illumina, the current market leader.

Illumina makes technology that generates large amounts of short-read data, which is highly accurate. One strength of Illumina’s technology is the ability to multiplex, or ‘barcode’, each DNA sample allowing many samples to be sequenced simultaneously on one of their machines. This enables large studies involving thousands of genomes – whether human, model organism or pathogen – to be carried out. An example of this is the 100,000 Genomes Project funded by NHS England, which aims to sequence genomic information from participants with some common types of cancer and rare diseases. Sequence data, together with information about the patient’s current condition and medical history, can be studied both to aid the patient and also help academia and industry alike better understand the causes of the conditions, or cancer types, and develop novel drugs to target these conditions.

A further benefit of Illumina’s technology is the range of platforms it manufactures. Ranging from the MiniSeq, producing just a few gigabases of data, to the HiSeq X Ten, a suite of 10 sequencers able to produce 18,000 human genomes per year, Illumina has set out to offer a sequencer to suit every laboratory situation. The different platforms have different strengths and weaknesses – considering the two above, the HiSeq X10 is able to produce the cheapest human genomes, breaking the so-called ‘$1,000 genome barrier’ for the first time. However, it is less economical when run substantially below capacity, and therefore is suited to large centres with many thousands of samples to process.

Conversely, the low throughput of the MiniSeq means it is not suited to laboratories wishing to carry out whole human genome sequencing. However, the machine automates many of the steps required to process DNA into a ‘library’ before sequencing, making it suited to a laboratory where scientists do not have extensive experience with next-generation sequencing.

Additional strategies have been developed to complement Illumina’s technology, such as the synthetic long-read technology developed by 10x Genomics. In this technology, single molecules of high molecular weight DNA are ‘captured’ inside an oil droplet, before fragmentation and labelling with a unique ‘identifier’ DNA molecule. The individual fragments are sequenced as short reads on an Illumina platform, before being recombined computationally to reconstitute the large DNA fragment from the short-read data. This enables users to obtain the benefits of Illumina’s highly accurate short reads, yet also place them in context by combining the short reads to create ‘long-read’ information. This has the potential for detecting structural variants within a genome, and also enables phasing of haplotypes.

New technologies for next-generation sequencing

There are two comparatively new technologies making inroads on the next-generation sequencing market, developed by Pacific Biosciences and Oxford Nanopore. Both technologies generate longer ‘reads’ of DNA than Illumina; where Illumina reads up to 300 bases, Pacific Biosciences’ DNA sequencing technology can read tens of kilobases of sequence in a single read. Although both companies produce instruments with a smaller output of data than Illumina’s highest yielding instruments, Pacific Biosciences expects to increase its throughput over the coming months. Like Pacific Biosciences, Oxford Nanopore’s MinION sequencer can also produce long reads of DNA, but has the additional benefit of being highly portable – a similar size to a mobile phone, it is run by plugging it into a laptop or desktop computer. Oxford Nanopore is in the process of developing and releasing two more instruments; the mid-sized GridION and the high-throughput PromethION.

A key use of next-generation sequencing for drug discovery is the generation of large datasets, which can be mined for the identification of novel targets. For example, this may include the identification of potential targets for novel antimicrobials when sequencing collections of bacterial isolates. At the Wellcome Trust Sanger Institute, many thousands of bacteria have been sequenced using both Illumina and Pacific Biosciences technology, creating databases that can be used by bioinformaticians to understand genes that are shared between all bacteria of a species (core genes) and to unpick the genes and genotypes linked to antimicrobial resistance phenotypes. A benefit of using Illumina sequencing for this is that it allows the rapid generation of a large number of bacterial genomes from a single sequencing run. Long-read Pacific Biosciences data has been used to create highly contiguous, or ‘complete’, assemblies, often enabling the accumulation of a single bacterial chromosomal contig, where Illumina data enables a more fractured, incomplete ‘draft’ assembly.

Additionally, next generation sequencing is not limited to whole genome sequencing. Targeted sequencing, including whole exomes through to panels of genes of interest, can be a cost-effective way of generating data. Focusing only on the genes of interest has the potential to miss upstream effects; however, it is cheaper, as less data is required and it is potentially easier to manage and store data, as less data is generated with targeted sequencing than with whole genomes.

A further use of next-generation sequencing is the sequencing of RNA, typically through conversion to cDNA, although Oxford Nanopore is in the process of developing direct RNA sequencing. A benefit of long-read sequencing, whether Pacific Biosciences or Oxford Nanopore, is that it can also be used to profile the relative abundance of different RNA transcripts within a cell or tissue.

Clinical uses

Next-generation sequencing can also be used to support the later stages of drug discovery, such as clinical trials. The Oxford Nanopore MinION has the advantage that it could be directly taken to the patient, even when the patient is in a remote, resource-limited location. A further potential benefit is the automated sample preparation systems under development by Oxford Nanopore, such as its VoITRAX machine. Subject to regulatory approval for clinical use, this would enable preparation of DNA ‘libraries’ for whole-genome sequencing outside the laboratory environment.

Additionally, the data generated by the MinION is accessible in real-time – meaning that within a few minutes of beginning a sequencing run, data is available to be analysed. This has potential advantages for a patient as it facilitates a rapid sample-to-answer solution, speeding up tailored prescribing. However, it has advantages for the research scientist too – not only in speed but also in enabling the precise amount of data required to be generated. Researchers can monitor data generation and stop a run once sufficient sequence data has been generated to answer the research question. The Illumina MiSeq also has a comparatively short run time, and would likely be suited to clinical situations, given that it has FDA approval for clinical sequencing.

An attractive aspect of next-generation sequencing is that it lends itself to in silico and in vitro studies, supporting scientists in the aim to replace, reduce and refine the use of animals in experimental procedures. Further, next-generation sequencing has the potential to enable further exploration of people’s genomes. Here, informed consent must be sought, prior to collection of tissue for sequencing. This may seem trivial, especially where the tissue required can be collected through a routine procedure, such as a blood sample. However, DNA sequencing can lead to both predictable findings, ie, genes linked to the study in question, and unrelated findings, such as a gene linked to late-onset diseases, or carrier status of disease. It can also have wider implications than for a person’s healthcare alone, with the potential to discern paternity or adoption status. Human DNA sequence, and inferences made from it, must be either carefully anonymised, stored under rigorous security, or released back to the donor through a considered process by a trained individual such as a genetic counsellor. A benefit of studies that use targeted sequencing is the reduced risk of unrelated findings as a side effect of the research questions.

An exciting new chapter

Next-generation sequencing looks set to begin an exciting new chapter in the field of drug discovery, but with caveats. The sequencing process itself must be bookended by other processes, thoughtfully planned, to obtain maximum value from the sequence data. Immediately apparent is the need for good quality, robust DNA extraction protocols, tailored to the organism or tissue being studied. A favourite catchphrase within the sequencing community is ‘rubbish in, rubbish out’ (or a variant with more informal language). Essentially, it is not possible to extract reliable results from poor starting DNA.

A second point is the data analysis. Within a wider skills shortage in STEM subjects, the bioinformatics skill shortage is perceived to be particularly acute. Tangentially, it underlines the role we all have to play in reaching out to the next generation of potential scientists, to educate and inspire through schools and outreach work. Finally, a key factor often overlooked is experimental design. While useful studies have arisen from a ‘sequence first, question later’ approach, investing time in considering which samples to sequence, to what depth of coverage, and using what sequencing technology, will likely pay off in the long run. Additionally, selecting an adequate number of samples to obtain statistically significant results is necessary.

Author biography

Kim Judge joined Illumina as a Research Associate in sequencing R&D, where she worked on the MiSeq, NextSeq500 and Nextera. She moved to the Department of Medicine at Addenbrooke’s Hospital, Cambridge in 2012, where her PhD focused on using the Oxford Nanopore MinION for microbiological applications including detecting antimicrobial resistance and identifying plasmids. She now works with the MinION, GridION and PromethION in the sequencing R&D team at the Wellcome Trust Sanger Institute.

Related organisations
Wellcome Trust Sanger Institute

Related people
Kim Judge

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

How next-generation sequencing is opening the door for drug discovery

New technologies for next-generation sequencing

Clinical uses

An exciting new chapter

Author biography

One response to “How next-generation sequencing is opening the door for drug discovery”

Leave a Reply Cancel reply

Recommended

How next-generation sequencing is opening the door for drug discovery

New technologies for next-generation sequencing

Clinical uses

An exciting new chapter

Author biography

Ion Channel Drug Discovery – Recent Advances in Novel Non-Opioid Pain Research

The power of big data to advance genomics into clinical care

Rare types of helper T cells uncovered

How MMR-deficient colorectal cancers regulate their growth

Developing a globally-applicable HIV cure

One response to “How next-generation sequencing is opening the door for drug discovery”

Leave a Reply Cancel reply