Petabytes of data – how informatics is transforming precision medicine

Withers, Nikki

Petabytes of data – how informatics is transforming precision medicine

14

SHARES

Share via

Posted: 26 March 2020 | Nikki Withers (Drug Target Review) | No comments yet

Advances in informatics have afforded researchers the ability to extrapolate petabytes of human genomics data and translate it into biologically relevant information. However, further translating this information into knowledge can prove challenging. Slavé Petrovski, Vice President and Head of Genome Analytics and Bioinformatics for AstraZeneca’s Centre for Genomics Research, spoke to Nikki Withers about how informatics has positively impacted precision medicine and genomics research.

Why is precision medicine so important for drug discovery?

The overarching goal of precision medicine is to transform patients’ lives by personalising their treatment. This can be achieved by identifying the underlying molecular cause or biomarkers of disease in individual patients. By knowing this, we aim to match medicines to those patients who are most likely to benefit from that specific treatment.

If you look at our research pipeline, approximately 90 percent follows a precision medicine approach compared to about 10 percent back in 2009. This includes a broad range of cutting-edge technologies for both wet lab and informatics, tumour tissue diagnostics, molecular tests and point-of-care diagnostics, which are allowing information to be available to the physician at the point of interaction with the patient.

How has informatics transformed genomics research?

In terms of sequencing technologies, informatics has improved our ability to generate high-quality data from raw samples. Having sophisticated algorithms allows us to turn this raw data into useful information. For example, aligning raw genomics data onto a reference genome allows us to identify which parts of the genome in an individual deviate from the rest of the population. Informatics has also allowed us to perform more sophisticated downstream analyses, such as adopting machine learning and artificial intelligence (AI) to mine these genetic variations in order to gain further biological insight.

Something we have been looking at recently is the use of sophisticated analytical frameworks on top of these data to ask further questions and tease out answers. For example: why does it matter that genetic variation is present in that individual? Does it cause disease? Does it change the way they respond to treatment? At AstraZeneca, we have a cloud-based informatics pipeline workflow, which processes all the genomes from our genomics initiative – it is our ambition to analyse up to two million genomes by 2026. This is optimised to the point where we can now complete the end-to-end analysis of approximately 1,300 sequences in an hour. To put that into context, that is a 10-fold increase in efficiency from 2017 and this is driven by the optimisation of our informatics pipeline in the cloud.

How is informatics aiding advances in precision medicine?

Every one of us has approximately three billion bases in our genome”

Every one of us has approximately three billion bases in our genome; that is three billion data points to study. When you span that across two million individuals, you can appreciate how much data that is, and the reason informatics has become increasingly important. For example, patients in selected clinical trials who have consented to genetic analysis may have their data linked to their clinical outcomes. This allows us to study how variations in their three billion bases correlates with how they respond to or tolerate a treatment, and whether they were the right patient population for that medicine given the underlying cause of disease. By integrating these anonymised genomic and clinical data from the hundreds or thousands of participants in our clinical trial programmes, we are aiming to identify the actual genetic profiles that can predict disease progression and response to treatment.

What challenges does informatics present to researchers?

The main challenge is extrapolating the maximum amount of biological insight from the vast amount of data we are generating; we must address how we can translate petabytes of genomics data into biologically relevant information. Translating that information into knowledge is the next step in the process and is an area we are on the journey of, using AI and machine learning.

Another challenge relates to how we incorporate other information, such as transcriptomics or metabolomics data, into the process. At AstraZeneca, we have seen value in investing into a multi-omics strategy, where we add additional layers of data types to gain improved insight, at the protein level, into what the outcome of a genetic mutation might be. This is a huge challenge – and opportunity, and one we’re actively pursuing.

What are your thoughts on collaboration in this area of research?

We know the best science doesn’t happen in isolation, which is why we collaborate with world-leading institutions, companies and individuals who share our passion for redefining medical science. For example, we are currently mining the exome sequence data from 300,000 individuals from a large UK Biobank project that AstraZeneca is part of. In collaboration with other pharma partners, we hope to generate the exome and also the whole genome sequence data for half a million participants, which will be a remarkable medical research resource. Through this genetic research we hope to not only identify new drug targets, particularly in diseases that to date have unmet clinical needs, but also support precision medicine programmes. Having access to such a large sequence population, which could provide information on why some patients respond to treatments while others do not, helps in the design of new trials and in identifying new drug targets.

For this collaboration, it was clear to all the individual partners in this pre-competitive consortium that the costs prohibited us from doing it alone. The obvious conclusion was to work together to generate these data. This paradigm shift from, “This is my silo of data,” to, “Let us build an immense medical research resource that we could all – industry and academia – benefit from,” had to happen, otherwise we would be limited in terms of progress.

What developments do you expect to see in the next five years?

It is very exciting to see how recent progress in informatics and technology enables large genomic studies to be conducted at scale. I could not have imagined analysing the exomes of 300,000 individuals in any bioinformatics environment five years ago. Those capabilities did not exist; partly because we did not have that scale of genomics data so there was no need to push the boundaries of technology. Like we have seen in other fields, often it is the data that instigates the need to build up innovative IT architecture.

I am excited to see what happens with quantum computing. I think this is an area on which to keep a finger on the pulse, but that will probably be a few years away from maturity.

Moving to analytics, we now have access to hundreds, thousands and, before we know it, millions of genomes. I am excited to see what we can extract from these; from studying individual variants with large effects on clinical outcomes to looking at combinations of variants to polygenic risk scores – all with the aim of getting the right treatment to the right patient at the right time.

Slavé Petrovski is the Vice President and Head of Genome Analytics and Bioinformatics for AstraZeneca’s Centre for Genomics Research (CGR). This involves designing and co-ordinating the human genomic studies of the CGR and the company’s broader Genomics Initiative. Slavé has an extensive academic background in human genomics, population genetics, precision medicine and leading large-scale human genomics studies.

Related organisations
AstraZeneca, UK Biobank

Related people
Slavé Petrovski (AstraZeneca)

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Petabytes of data – how informatics is transforming precision medicine

Why is precision medicine so important for drug discovery?

How has informatics transformed genomics research?

How is informatics aiding advances in precision medicine?

What challenges does informatics present to researchers?

What are your thoughts on collaboration in this area of research?

What developments do you expect to see in the next five years?

Leave a Reply Cancel reply

Recommended

Petabytes of data – how informatics is transforming precision medicine

Why is precision medicine so important for drug discovery?

How has informatics transformed genomics research?

How is informatics aiding advances in precision medicine?

What challenges does informatics present to researchers?

What are your thoughts on collaboration in this area of research?

What developments do you expect to see in the next five years?

SAFIRE: AI predicted ADME for drug discovery: leveraging BioPrint

The power of big data to advance genomics into clinical care

Part three: pragmatic guidelines to getting the best out of LLMs

Predicting persistent taxane-induced peripheral neuropathy

Part two: how ChatGPT enriched animal study results

Leave a Reply Cancel reply