AI-driven drug discovery: insights from Cresset
Posted: 26 September 2024 | Dr Mutlu Dogruel (Cresset) | No comments yet
In this in-depth Q&A, Mutlu Dogruel, Vice President of AI at Cresset shares his insights surrounding chatbots, retrieval augmented generation and AI hallucinations, to open up new possibilities for innovation in pharmaceutical research.
Cresset delivers software solutions and contract research expertise enabling companies around the world to accelerate their small molecule discovery processes efficiently and effectively. Customers include the pharmaceutical, biotechnology and agrochemical industries. Dr Mutlu Dogruel is the VP of AI Solutions, who has outlined a forward-thinking vision to revolutionise drug discovery processes by integrating cutting-edge AI technologies to enhance productivity, streamline workflows, and empower researchers.
What are the potential benefits associated with using AI to generate new drug candidates?
Artificial Intelligence (AI) and Machine Learning (ML) have made significant strides in enriching drug discovery processes. These technologies can analyse vast datasets of chemical entities and biological interactions, offering predictions on the behaviour of new drug candidates. By leveraging historical data and research outcomes, AI-driven models can offer probabilistic insights into the likely efficacy, toxicity, and side-effect profiles of compounds, although these predictions require further validation. While these models present an informed starting point, it’s important to integrate these insights with human expertise to navigate the complexities of drug discovery. This collaborative approach can enhance the early identification of promising candidates, potentially reducing the frequency of expensive failures in later development stages.
A second benefit, and one of the most crucial for AI in drug discovery, is its diversity and novelty. AI does not solely stop at optimising known chemical spaces; instead, it can venture into uncharted territories. Generative chemistry methods involving AI can suggest novel molecular structures that may not be immediately apparent to human chemists, although these require experimental validation. This enables us to explore novel drug candidates that could be more effective or have fewer side effects than existing treatments and drugs.
Then comes resource optimisation. The benefits of AI in resource optimisation are significant because we are not only speeding up the discovery process but also making it more resource efficient. This one also requires human expertise, so the combination of this with an AI system will allow the prioritisation of the most promising candidates for synthesis and testing, thereby optimising the use of laboratory resources and reducing wastage.
Generally, aside from generative chemistry and other predictive modelling and resource optimisation, we also see AI empowering researchers in many different ways, which is another exciting benefit. AI can conduct the data analysis and candidate generation, leaving researchers to focus on more strategic and creative aspects of drug discovery. This symbiotic relationship between human expertise and AI capabilities leads to a more dynamic and innovative research and environment.
Furthermore, another benefit is the ethical and responsible innovations. At Cresset, we strive to prioritise responsible AI by implementing practices aimed at ensuring our models are transparent, fair, and ethically sound, to confidently move forward with drug candidates that are not only effective but also designed with the highest standards of ethical consideration.
What are the potential ethical concerns associated with AI hallucinations in the drug discovery process?
This is a great and a very timely question. While AI is great, we should also consider how authentic it is, because we are talking about drug design that could save lives. Therefore, it is important to assess how we can mitigate some of these potential harms.
AI hallucinations, where models generate results or predictions not grounded in real-world data, can range from minor inaccuracies to completely false outputs. Just like similar to traditional ML models, underperforming can lead to the identification of drug candidates that appear promising in silico, in computer simulations, but fail in real world testing. This can result in wasted resources, time, effort and not to mention the potential for false hope in the early stages of drug development.
Another concern surrounds ethical issues. There is a significant safety risk if an AI model hallucinates a drug candidate that is predicted to be safe and effective but turns out to be toxic. It highlights the importance of rigorous validation or and cross checking of AI generated results before progressing to any stage of clinical testing.
Also, AI models are only as good as the data they are trained on. If the training data is biased or incomplete, the AI can generate skewed or misleading results. ML models can be checked to identify and mitigate biases using independent, well-curated test data. Additionally, you can examine every single dimension of your training dataset to determine whether there are any problems that may introduce biases.
This is slightly different when we are talking about a Large Language Model (LLM). Since most LLMs were trained on large amounts of data that may have biases and conflicting information, we have to apply further steps to minimise the risks associated with such biases. This brings me to the accountability dimension of AI hallucinations. If an AI generated drug candidate leads to adverse outcomes, who is responsible? Is it the developers who created the application, or the end user? Therefore, ensuring clear lines of accountability is essential for maintaining trust in AI driven processes. At Cresset, we address these concerns through our commitment to responsible AI, which was one of our four AI pillars.
Responsible AI involves the implementation of robust validation protocols to ensure that AI generated candidates are totally vetted before moving forwards. This includes cross referencing with experimental data and expert review.
Cresset maintain transparency about the capabilities and limitations of our AI models.
This is very important, and no one should claim that an AI model will do everything perfectly. Users should be informed about the confidence levels of AI predictions and the underlying data sources.
We are going to appoint a Responsible AI Officer within the AI division as well ensuring that someone external to the product team within the AI division checks all processes.
Continuous improvement is another aspect, which is again related to our pillars. AI is a very fastmoving, evolving field. We are committed to ongoing research and development to improve the accuracy and reliability of our models. For example, if there is a new version of GPT 4, that works better for minimising the risks of AI hallucinations, then we will use that.
By addressing these ethical concerns head on, we aim to harness the full potential of AI in drug discovery, while safeguarding against its pitfalls. It is a delicate balance, but one that we are fully committed to achieving.
How does retrieval augmented generation help prevent misleading information caused by AI hallucinations in drug discovery applications?
This is a great question and is one that touches on a very promising approach to mitigating the risk of AI hallucinations in drug discovery.
Retrieval augmented generation (RAG) combines the strengths of two paradigms – retrieval-based models and generation-based models. It enhances a generative model which can create new data based on learned patterns, and it does so with a retrieval mechanism that pulls in relevant factual information from a predefined knowledge base. This hybrid approach ensures that the generated outputs are both creative and grounded in reality.
RAG helps reduce the risk of misleading information by grounding generated outputs in trusted data. The retrieval component ensures that the information the model generates is corroborated by existing validated sources, significantly reducing the risk of the model hallucinating facts. RAG systems can retrieve contextually relevant information, such as specific biochemical pathways, historical data on similar compounds, or clinical trial results before generating new – hypotheses or drug candidates. RAG models can cross reference generated candidates with existing databases, and you can attach to this system any databases that you may have to validate their potential efficacy and safety.
Furthermore, RAG models can enhance transparency and explainability while helping to reduce bias by accessing a diverse range of datasets. These models are less likely to be biased by the limitations of a single dataset, because the idea is that you can create multiple connections in your architecture. The delicate balance between innovation and reliability by grounding generative outputs in factual, validated data RAG significantly reduces the risk of hallucinations. At Cresset, we believe in a multifaceted strategy that incorporates a variety of techniques and frameworks to enhance the accuracy and trustworthiness of our AI models.
Some non-RAG approaches to reduce hallucinations include knowledge graphs. Knowledge graphs provide an invaluable structured framework that links entities and concepts in a manner that mirrors real world relationships. Developed by Microsoft Research, Graph RAG combines the strengths of RAG and knowledge graphs, making the AI models responses both contextually accurate and semantically rich by integrating knowledge graphs into the retrieval augmented generation process. We can achieve a hybrid solution that enhances the reliability and depth of our AI outputs and this is particularly valuable in complex fields like drug discovery, where nuanced and contextual accuracy are crucial.
Employing multiple LLMs with a majority voting approach is another technique that may be used to enhance the output reliability. It is essential that these LLMs are truly independent, trained on diverse datasets and using different architectures.
Agentic frameworks are one of my favourite topics. These further improve our handling of AI hallucinations and complex problem solving by breaking down tasks and distributing them among multiple autonomous agents to enhance the accuracy and relevance of the generated response.
In terms of technical approaches, each LLM model will come with its own tuneable parameters such as temperature and Top-P sampling. Lowering the temperature reduces randomness to produce more coherent and focused responses. Top-P sampling is considered alongside agentic frameworks and perhaps majority voting, enabling LLMs to deliver more meaningful responses. Of course, there are also filters on top of everything.
In summary, while RAG is a powerful tool, it’s just one part of a broader strategy to better harness the innovative potential of AI while ensuring the reliability and safety of our drug discovery outcomes.
How can AI software developers balance the innovative potential of AI with the need for reliable and safe drug discovery outcomes?
Balancing the innovative potential of AI with the need for reliable and safe drug discovery outcomes is absolutely possible and is at the very heart of what we do. AI opens up a world of possibilities that were unimaginable just a few years ago. We are currently integrating our products with generative chemistry and AI chatbots. Imagine an AI agent that can suggest new molecular candidates that have a higher likelihood of docking successfully with a target protein.
However, innovation without reliability and safety is like building a castle on sand.
This is where our commitment to responsible AI comes into play. As I mentioned before, at Cresset, we ensure that our AI models are transparent, fair and ethically sound. Our users need to understand how decisions are made and have confidence that these decisions are both fair and safe.
Our DevOps AI pillar means that every AI solution we deploy is thoroughly tested for reliability and scalability, aligning our projects closely with the digital transformation objectives set out by our CEO. Balancing innovation with reliability and safety is a guiding principle for us.
In what ways can the integration of generative AI chatbots improve the predictive analytics capabilities of drug discovery tools?
Chatbots can act as intelligent assistants, providing real time insights and predictions. For instance, our Flare and Torx copilots will be soon able to run existing product functions, automate routine tasks, and even suggest next steps based on ongoing analyses.
Moreover, these chatbots can facilitate dynamic research. If a researcher receives a list of potential candidates but wants to tweak certain parameters, they can simply communicate this to the chatbot. The generative AI can then rerun the predictive models in the background with the new criteria employed. This level of interactivity allows for a more agile research process, and we are actively exploring this capability.
Another significant benefit is the ability to generate hypotheses and explore new avenues of research. Generative AI chatbots can propose novel molecular structures based on existing data, effectively expanding the pool of potential drug candidates.
Furthermore, these chatbots can assist in identifying patterns and correlations that human researchers might overlook. By continuously learning from new data and feedback, they can refine their predictive capabilities tailored specifically to each user, ensuring that any insights gained remain unique to that user and are not transferred to other users or used to improve our models.
The integration of generative and orchestration AI chatbots into drug discovery tools enhances predictive analytics by making data more accessible and actionable, enabling dynamic research, proposing novel hypotheses and improving efficiency. Ultimately, this accelerates the drug discovery process and opens up new possibilities for innovation and breakthroughs in pharmaceutical research.
About the author
Dr Mutlu Dogruel, Vice President of AI Solutions at Cresset
Dr Mutlu Dogruel is the VP of AI Solutions at Cresset, where he leads the integration of AI across the company’s drug discovery products and services, focusing on productivity, responsible AI practices, systematic AI deployment, and internal optimisations.
Mutlu has a robust academic background that includes a bachelor’s degree in physics, a master’s degree in bioengineering, and a PhD in bioinformatics from the University of Cambridge. His research focused on applying Machine Learning and statistical methods to computational biology problems, including subcellular localisation prediction and protein motif discovery.
Prior to joining Cresset, Mutlu was the lead Pharma AI Architect at Microsoft UK, where he spent almost eight years designing cloud-based solutions for pharmaceutical and life sciences companies. He also held a genomics consultancy role and worked as an R&D engineer in the speech-to-text domain at Hewlett Packard.
Related topics
Artificial Intelligence, Drug Development, Drug Discovery, Drug Targets
Related organisations
Cresset
Related people
Dr Mutlu Dogruel (Cresset)