AI-driven drug discovery: insights from Cresset

Share via

Posted: 26 September 2024 | Dr Mutlu Dogruel (Cresset) | No comments yet

In this in-depth Q&A, Mutlu Dogruel, Vice President of AI at Cresset shares his insights surrounding chatbots, retrieval augmented generation and AI hallucinations, to open up new possibilities for innovation in pharmaceutical research.

Cresset delivers software solutions and contract research expertise enabling companies around the world to accelerate their small molecule discovery processes efficiently and effectively. Customers include the pharmaceutical, biotechnology and agrochemical industries. Dr Mutlu Dogruel is the VP of AI Solutions, who has outlined a forward-thinking vision to revolutionise drug discovery processes by integrating cutting-edge AI technologies to enhance productivity, streamline workflows, and empower researchers.

What are the potential benefits associated with using AI to generate new drug candidates?

Artificial Intelligence (AI) and Machine Learning (ML) have made significant strides in enriching drug discovery processes. These technologies can analyse vast datasets of chemical entities and biological interactions, offering predictions on the behaviour of new drug candidates. By leveraging historical data and research outcomes, AI-driven models can offer probabilistic insights into the likely efficacy, toxicity, and side-effect profiles of compounds, although these predictions require further validation. While these models present an informed starting point, it’s important to integrate these insights with human expertise to navigate the complexities of drug discovery. This collaborative approach can enhance the early identification of promising candidates, potentially reducing the frequency of expensive failures in later development stages.

A second benefit, and one of the most crucial for AI in drug discovery, is its diversity and novelty. AI does not solely stop at optimising known chemical spaces; instead, it can venture into uncharted territories. Generative chemistry methods involving AI can suggest novel molecular structures that may not be immediately apparent to human chemists, although these require experimental validation. This enables us to explore novel drug candidates that could be more effective or have fewer side effects than existing treatments and drugs.

Then comes resource optimisation. The benefits of AI in resource optimisation are significant because we are not only speeding up the discovery process but also making it more resource efficient. This one also requires human expertise, so the combination of this with an AI system will allow the prioritisation of the most promising candidates for synthesis and testing, thereby optimising the use of laboratory resources and reducing wastage.

Generally, aside from generative chemistry and other predictive modelling and resource optimisation, we also see AI empowering researchers in many different ways, which is another exciting benefit. AI can conduct the data analysis and candidate generation, leaving researchers to focus on more strategic and creative aspects of drug discovery. This symbiotic relationship between human expertise and AI capabilities leads to a more dynamic and innovative research and environment.

Furthermore, another benefit is the ethical and responsible innovations. At Cresset, we strive to prioritise responsible AI by implementing practices aimed at ensuring our models are transparent, fair, and ethically sound, to confidently move forward with drug candidates that are not only effective but also designed with the highest standards of ethical consideration.

What are the potential ethical concerns associated with AI hallucinations in the drug discovery process?

This is a great and a very timely question. While AI is great, we should also consider how authentic it is, because we are talking about drug design that could save lives. Therefore, it is important to assess how we can mitigate some of these potential harms.

AI hallucinations, where models generate results or predictions not grounded in real-world data, can range from minor inaccuracies to completely false outputs. Just like similar to traditional ML models, underperforming can lead to the identification of drug candidates that appear promising in silico, in computer simulations, but fail in real world testing. This can result in wasted resources, time, effort and not to mention the potential for false hope in the early stages of drug development.

Another concern surrounds ethical issues. There is a significant safety risk if an AI model hallucinates a drug candidate that is predicted to be safe and effective but turns out to be toxic. It highlights the importance of rigorous validation or and cross checking of AI generated results before progressing to any stage of clinical testing.

Also, AI models are only as good as the data they are trained on. If the training data is biased or incomplete, the AI can generate skewed or misleading results. ML models can be checked to identify and mitigate biases using independent, well-curated test data. Additionally, you can examine every single dimension of your training dataset to determine whether there are any problems that may introduce biases.

This is slightly different when we are talking about a Large Language Model (LLM). Since most LLMs were trained on large amounts of data that may have biases and conflicting information, we have to apply further steps to minimise the risks associated with such biases. This brings me to the accountability dimension of AI hallucinations. If an AI generated drug candidate leads to adverse outcomes, who is responsible? Is it the developers who created the application, or the end user? Therefore, ensuring clear lines of accountability is essential for maintaining trust in AI driven processes. At Cresset, we address these concerns through our commitment to responsible AI, which was one of our four AI pillars.

Responsible AI involves the implementation of robust validation protocols to ensure that AI generated candidates are totally vetted before moving forwards. This includes cross referencing with experimental data and expert review.

Cresset maintain transparency about the capabilities and limitations of our AI models.
This is very important, and no one should claim that an AI model will do everything perfectly. Users should be informed about the confidence levels of AI predictions and the underlying data sources.

We are going to appoint a Responsible AI Officer within the AI division as well ensuring that someone external to the product team within the AI division checks all processes.

Continuous improvement is another aspect, which is again related to our pillars. AI is a very fastmoving, evolving field. We are committed to ongoing research and development to improve the accuracy and reliability of our models. For example, if there is a new version of GPT 4, that works better for minimising the risks of AI hallucinations, then we will use that.

By addressing these ethical concerns head on, we aim to harness the full potential of AI in drug discovery, while safeguarding against its pitfalls. It is a delicate balance, but one that we are fully committed to achieving.

How does retrieval augmented generation help prevent misleading information caused by AI hallucinations in drug discovery applications?

This is a great question and is one that touches on a very promising approach to mitigating the risk of AI hallucinations in drug discovery.

Retrieval augmented generation (RAG) combines the strengths of two paradigms – retrieval-based models and generation-based models. It enhances a generative model which can create new data based on learned patterns, and it does so with a retrieval mechanism that pulls in relevant factual information from a predefined knowledge base. This hybrid approach ensures that the generated outputs are both creative and grounded in reality.

RAG helps reduce the risk of misleading information by grounding generated outputs in trusted data. The retrieval component ensures that the information the model generates is corroborated by existing validated sources, significantly reducing the risk of the model hallucinating facts. RAG systems can retrieve contextually relevant information, such as specific biochemical pathways, historical data on similar compounds, or clinical trial results before generating new – hypotheses or drug candidates. RAG models can cross reference generated candidates with existing databases, and you can attach to this system any databases that you may have to validate their potential efficacy and safety.

Furthermore, RAG models can enhance transparency and explainability while helping to reduce bias by accessing a diverse range of datasets. These models are less likely to be biased by the limitations of a single dataset, because the idea is that you can create multiple connections in your architecture. The delicate balance between innovation and reliability by grounding generative outputs in factual, validated data RAG significantly reduces the risk of hallucinations. At Cresset, we believe in a multifaceted strategy that incorporates a variety of techniques and frameworks to enhance the accuracy and trustworthiness of our AI models.

Some non-RAG approaches to reduce hallucinations include knowledge graphs. Knowledge graphs provide an invaluable structured framework that links entities and concepts in a manner that mirrors real world relationships. Developed by Microsoft Research, Graph RAG combines the strengths of RAG and knowledge graphs, making the AI models responses both contextually accurate and semantically rich by integrating knowledge graphs into the retrieval augmented generation process. We can achieve a hybrid solution that enhances the reliability and depth of our AI outputs and this is particularly valuable in complex fields like drug discovery, where nuanced and contextual accuracy are crucial.

Employing multiple LLMs with a majority voting approach is another technique that may be used to enhance the output reliability. It is essential that these LLMs are truly independent, trained on diverse datasets and using different architectures.

Agentic frameworks are one of my favourite topics. These further improve our handling of AI hallucinations and complex problem solving by breaking down tasks and distributing them among multiple autonomous agents to enhance the accuracy and relevance of the generated response.

In terms of technical approaches, each LLM model will come with its own tuneable parameters such as temperature and Top-P sampling. Lowering the temperature reduces randomness to produce more coherent and focused responses. Top-P sampling is considered alongside agentic frameworks and perhaps majority voting, enabling LLMs to deliver more meaningful responses. Of course, there are also filters on top of everything.

In summary, while RAG is a powerful tool, it’s just one part of a broader strategy to better harness the innovative potential of AI while ensuring the reliability and safety of our drug discovery outcomes.

How can AI software developers balance the innovative potential of AI with the need for reliable and safe drug discovery outcomes?

Balancing the innovative potential of AI with the need for reliable and safe drug discovery outcomes is absolutely possible and is at the very heart of what we do. AI opens up a world of possibilities that were unimaginable just a few years ago. We are currently integrating our products with generative chemistry and AI chatbots. Imagine an AI agent that can suggest new molecular candidates that have a higher likelihood of docking successfully with a target protein.

However, innovation without reliability and safety is like building a castle on sand.
This is where our commitment to responsible AI comes into play. As I mentioned before, at Cresset, we ensure that our AI models are transparent, fair and ethically sound. Our users need to understand how decisions are made and have confidence that these decisions are both fair and safe.

Our DevOps AI pillar means that every AI solution we deploy is thoroughly tested for reliability and scalability, aligning our projects closely with the digital transformation objectives set out by our CEO. Balancing innovation with reliability and safety is a guiding principle for us.

In what ways can the integration of generative AI chatbots improve the predictive analytics capabilities of drug discovery tools?

Chatbots can act as intelligent assistants, providing real time insights and predictions. For instance, our Flare and Torx copilots will be soon able to run existing product functions, automate routine tasks, and even suggest next steps based on ongoing analyses.

Moreover, these chatbots can facilitate dynamic research. If a researcher receives a list of potential candidates but wants to tweak certain parameters, they can simply communicate this to the chatbot. The generative AI can then rerun the predictive models in the background with the new criteria employed. This level of interactivity allows for a more agile research process, and we are actively exploring this capability.

Another significant benefit is the ability to generate hypotheses and explore new avenues of research. Generative AI chatbots can propose novel molecular structures based on existing data, effectively expanding the pool of potential drug candidates.

Furthermore, these chatbots can assist in identifying patterns and correlations that human researchers might overlook. By continuously learning from new data and feedback, they can refine their predictive capabilities tailored specifically to each user, ensuring that any insights gained remain unique to that user and are not transferred to other users or used to improve our models.

The integration of generative and orchestration AI chatbots into drug discovery tools enhances predictive analytics by making data more accessible and actionable, enabling dynamic research, proposing novel hypotheses and improving efficiency. Ultimately, this accelerates the drug discovery process and opens up new possibilities for innovation and breakthroughs in pharmaceutical research.

About the author

Dr Mutlu Dogruel, Vice President of AI Solutions at Cresset

Dr Mutlu Dogruel is the VP of AI Solutions at Cresset, where he leads the integration of AI across the company’s drug discovery products and services, focusing on productivity, responsible AI practices, systematic AI deployment, and internal optimisations.

Mutlu has a robust academic background that includes a bachelor’s degree in physics, a master’s degree in bioengineering, and a PhD in bioinformatics from the University of Cambridge. His research focused on applying Machine Learning and statistical methods to computational biology problems, including subcellular localisation prediction and protein motif discovery.

Prior to joining Cresset, Mutlu was the lead Pharma AI Architect at Microsoft UK, where he spent almost eight years designing cloud-based solutions for pharmaceutical and life sciences companies. He also held a genomics consultancy role and worked as an R&D engineer in the speech-to-text domain at Hewlett Packard.

Related organisations
Cresset

Related people
Dr Mutlu Dogruel (Cresset)

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

AI-driven drug discovery: insights from Cresset

What are the potential benefits associated with using AI to generate new drug candidates?

What are the potential ethical concerns associated with AI hallucinations in the drug discovery process?

How does retrieval augmented generation help prevent misleading information caused by AI hallucinations in drug discovery applications?

How can AI software developers balance the innovative potential of AI with the need for reliable and safe drug discovery outcomes?

In what ways can the integration of generative AI chatbots improve the predictive analytics capabilities of drug discovery tools?

Leave a Reply Cancel reply

Recommended

AI-driven drug discovery: insights from Cresset

What are the potential benefits associated with using AI to generate new drug candidates?

What are the potential ethical concerns associated with AI hallucinations in the drug discovery process?

How does retrieval augmented generation help prevent misleading information caused by AI hallucinations in drug discovery applications?

How can AI software developers balance the innovative potential of AI with the need for reliable and safe drug discovery outcomes?

In what ways can the integration of generative AI chatbots improve the predictive analytics capabilities of drug discovery tools?

Unprecedented fragment-based screening using Spectral Shift for GPCRs

Toxicology transformed: Why accuracy now leads the way

AI paves the way for new immunotherapy targets

Key enzyme discovery marks a step forward in Alzheimer’s treatment

Nobel Prize celebrates AI’s role in protein structure innovation

Leave a Reply Cancel reply