How the AI revolution can accelerate early drug discovery

Share via

Posted: 20 September 2023 | Dr Robert Scoffin (Cresset), Matthew Habgood (Cresset) | No comments yet

Rob Scoffin and Matthew Habgood from solutions provider Cresset look to the future of drug discovery and the roles that artificial intelligence and machine learning could play.

Global network. Blockchain. 3D illustration. Neural networks and artificial intelligence. Abstract technological background with binary code elements

“AI will not replace drug discovery scientists, but drug discovery scientists who use AI will replace those who don’t” – comment during EFMC meeting 2018

Progressing a drug molecule from concept to commercialisation typically takes 10-15 years and has high associated costs of up to $2 billion per launched drug, if all failures are factored in.¹ While many of these costs (most failures) occur within the development and clinical phases, the early discovery phase has high associated costs too.

This has led to a demand for the application of novel technologies to speed up and de-risk the discovery pipeline. Of current interest are artificial intelligence (AI) and machine learning (ML) approaches, which offer the ability to expand the chemical ‘search space’ for novel compounds, enable accelerated calculations of complex properties and provide insights into inherently noisy and incomplete information.

“In the future ALL DRUGS will be designed with AI” – AIDD company website

Here, we discuss whether and how AI and ML can accelerate delivery of a final drug candidate, while also examining why the above statement isn’t necessarily true.

AI and ML enhance efficiency in drug discovery

Selecting the best candidate molecule to progress from discovery to new chemical entity (NCE) is complex and time consuming. As a result, drug discovery teams are increasingly looking to improve the quality of their initial hit molecules to get a head start on this process. Drivers for this include that it is significantly harder to find novel molecules to bind to many drug targets of current interest, especially compared to twenty or thirty years ago. This has led to increased interest in evaluating many more molecules than previously, without significantly increasing wet chemistry and biology budgets.

AI and ML methods running on latest generation hardware can triage molecules at a scale that was unthinkable several years ago. Current practice contemplates ‘virtual library’ sizes of tens of billions of molecules; either in a ‘bottom up’ scenario where the library consists of molecules defined by a set of synthetic reactions and appropriate starting materials to form an ultra-large virtual library (ULVL), and/or a ‘top down’ scenario where the chemical space is defined by the assembly using a ‘generative AI’ of molecules from a ‘molecular soup’ of small chemical fragments. Each approach has strengths and weaknesses – generative AI, for instance, tends to suggest many molecules that are synthetically inaccessible or chemically unstable. Conversely, the ULVL contains predominantly synthetically accessible molecules, but has biased (and constrained) diversity, so does not explore as widely. In both cases, there is a large-scale computing challenge inherent in considering billions of options and reducing that down to a few tens to hundreds of molecules, which will then be made or acquired and tested against the biological target of interest.

“What I don’t understand is why my medicinal chemists don’t make all of the molecules I design” – AI scientist reflecting on why they hadn’t yet developed the ‘perfect’ molecule for their target

Although the filtering process, at least for early drug discovery, can be (or must be) highly automated, in the later stages, when nearing candidate selection, the process naturally switches to human decision making, albeit in the context of the multi-factorial data generated throughout the discovery project to-date. The nature of the AI ‘support’ to the human scientist must therefore alter throughout the discovery process.

How AI/ML technology has accelerated drug discovery

Although AI for drug discovery (AIDD) can be viewed as being in the initial phases of development and acceptance, compounds discovered using AIDD platforms are already entering clinical trials. The overall impact of AI and ML approaches is potentially very profound and can be seen at multiple levels. Aside from suggesting novel molecules in the front-end of a project, it can also accelerate the optimisation of molecules through the rapid calculation of complex properties and analysis of large-scale and inter-dependent data to enable candidate selection on a more robust and reproducible basis; for example:

Use of AI/ML in predicting results of complex calculations

Many complex calculations are completed throughout drug discovery to determine how a drug candidate behaves and identify if the properties of a molecule align with the desired profile for a drug to treat the disease being targeted. Training AI/ML tools to predict results of otherwise complex and time-consuming calculations is gaining traction in pharmaceutical R&D. These approaches require extensive datasets, with continuous input and output, to train the ML model for accurate predictions before use. ML techniques have been applied to:

approximate the quantum mechanics (QM) of compound libraries. Accelerated and scaled analysis of the electronic contributions to the physical and chemical properties of a molecule – a single calculation wherein can take several hours using traditional QM codes – can be delivered with comparable results with a trained AI model in milliseconds.
predict binding energies by free energy perturbation (FEP) calculations. FEP calculations are resource-intensive, but using AI methods to screen a library with FEP calculations accurately predicts the binding affinity and leads to results comparable to experimental measurements.
protein folding for structural enablement. AI models can generate structures for targets that have not been determined experimentally. The path to using these in drug discovery has not been smooth, but as the sophistication of the tools and understanding of their output grows, AI-generated protein structures look likely to be increasingly adopted.

AI/ML has slashed screening times for ultra-large libraries

Using ML methods, an AI model can be built to screen molecules against a chosen drug target. Filtering through billions of potential drug candidates in quick succession, the AI/ML system can be trained to process ultra-large libraries of molecules to predict properties such as binding affinity. Operating at a scale that would not be feasible using traditional methods, this cost-effective approach significantly streamlines candidate analysis, removing molecules that lack the desired properties from the drug discovery study. This process can consider more compounds than would ever be possible in the lab and allows scientists to focus on the most promising contenders identified through virtual screening methods.

Generation of synthetic routes using AI

AI methods are extremely adept at pattern recognition in complex and noisy datasets. An early example of the use of AI in chemistry and drug discovery was the generation and optimisation of synthetic routes to molecules of interest.² AI is trained using literature examples of synthetic reactions and is then able to rapidly search for viable synthesis routes to a set of compounds, as well as further optimise routes to give better yields or lower costs. In published studies the AI performed similarly to very experienced synthetic organic chemists when challenged to develop a route to a given set of test compounds, but was able to consistently find better (fewer steps/higher yields/lower feedstock costs, etc) routes in a far more reasonable time.³

Overcoming limitations to embrace the future of AI

AI/ML techniques have already made a significant impact on drug discovery, offering advanced solutions to funnel potential candidates through the pipeline at speed. Although AI has the potential to solve other limiting steps in drug discovery, application of AI/ML systems is restricted by the data available, as significant amounts of information is required to train an AI model. The nature of the drug discovery process necessarily requires the implementation of strict data protection and privacy, which does unfortunately limit data sharing.

To really benefit from AI, the pharmaceutical industry must be more open to data sharing. Larger amounts of available data would significantly increase the range of problems to which AI/ML tools can be applied. Complex molecular molecular properties in the areas of absorption, distribution, metabolism or excretion (ADME) could be better predicted, accurately filtering out poor candidates and streamlining discovery.

Another limitation to date has been the relative scarcity of experts who are proficient in both drug discovery and AI, whom are necessary to ensure that the tools will be utilised efficiently to deliver accurate insights and to inform and accelerate drug discovery.

The future of AI in drug discovery

While we don’t subscribe to the “one day all drug discovery will be done solely using AI” philosophy, it is clear that AI can positively impact challenges encountered throughout the drug discovery process. Current limitations to adoption hinge on training ML models, with restricted access to data and strict confidentiality in the pharmaceutical industry forming a barrier. Interest in utilising AI is on the rise, however, for applications including approximation of quantum mechanics, generation of optimal synthetic routes for compounds, predicting protein structures, accurate prediction of FEP calculations, and scaling of docking. As more data is generated, ML tools could be better positioned to address these complex calculations.

Integration of AI/ML into drug discovery platforms has already led to considerable improvements, with collective efforts accelerating drug discovery rates, enhancing efficiency and reducing costs. AI/ML systems will continue to evolve and shape the pharmaceutical industry, significantly enhancing scale and accelerating candidate selection.

Author bio:

Matthew Habgood

Principal Computational Chemist, Cresset

Matthew graduated from Imperial College London in 2004 with an MSci in Chemistry and was co-awardee of the Neil Arnott prize for best chemistry graduate at the University of London. He subsequently obtained an MSc in Mathematical Modelling and Scientific Computing from the University of Oxford.

From 2005 to 2008 he carried out a DPhil (PhD) at Oxford on nanomaterials for quantum computation, followed by postdoctoral work on the prediction of crystal structures at University College London.

Matthew subsequently put his computational chemistry skills to use in the defence sector. He then joined the pharmaceutical industry as a senior scientist at Evotec in 2016, working to develop drug candidates for a wide variety of internal and external clients. Following a second stint in defence, he joined Cresset in 2022, working to develop, source and evaluate new computational techniques for Cresset’s software.

Matthew is a chartered chemist. He has published 22 scientific articles and has a h-index of 13.

Dr Robert Scoffin

CEO and Chairman, Cresset

Robert is an expert in the fields of molecular modelling and cheminformatics. His DPhil is in chemistry from the University of Oxford. Rob is passionate about applying computational methods to help meet medical challenges. He believes drug discovery and design can be streamlined and improved through the use of computational methods, resulting in better drugs being brought to market sooner. Rob is a Fellow of the Royal Society of Chemistry.

Rob joined Cresset as CEO in 2010 and now also serves as Chairman of Cresset. During the time he has been leading the team, Cresset has further developed its software and contract research divisions from a very solid customer base, into a high-growth and profitable business. Rob also serves as Co-Chairman for Torx Software, a collaboration between Cresset and Elixir Software. Previous roles include CEO of Amedis and VP, Europe at CambridgeSoft.

References

Austin D, Hayford T. Research and Development in the Pharmaceutical Industry | Congressional Budget Office [Internet]. Congressional Budget Office. 2021. Available from: https://www.cbo.gov/publication/57126

Schneider G. Automating drug discovery. Nat Rev Drug Discov 17, 97–113 (2018). https://doi.org/10.1038/nrd.2017.232

Paul D, Sanap G, Shenoy S, et al. Artificial intelligence in drug discovery and development. Drug discovery today, 26(1), 80–93 (2021). https://doi.org/10.1016/j.drudis.2020.10.010

Related organisations
Cresset Discovery Services

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

How the AI revolution can accelerate early drug discovery

AI and ML enhance efficiency in drug discovery

How AI/ML technology has accelerated drug discovery

Use of AI/ML in predicting results of complex calculations

AI/ML has slashed screening times for ultra-large libraries

Generation of synthetic routes using AI

Overcoming limitations to embrace the future of AI

The future of AI in drug discovery

Leave a Reply Cancel reply

Recommended

How the AI revolution can accelerate early drug discovery

AI and ML enhance efficiency in drug discovery

How AI/ML technology has accelerated drug discovery

Use of AI/ML in predicting results of complex calculations

AI/ML has slashed screening times for ultra-large libraries

Generation of synthetic routes using AI

Overcoming limitations to embrace the future of AI

The future of AI in drug discovery

Reprogramming immunity: designing smarter checkpoint receptor agonists

Disabling the SETD1B enzyme halts leukaemia cell growth

How bowel cancer beats treatment – and how AI can stop it

Protein folding milestone achieved with quantum tech

The AI model that is changing clinical trial design

Leave a Reply Cancel reply