From siloed data to breakthroughs: multimodal AI in drug discovery

Share via

Posted: 11 June 2025 | Alessio Zoccoli, Carlos N Velez, Remco Jan Geukes Foppen, Vincenzo Gioia | No comments yet

Drug development has long been hindered by fragmented data and complex processes, but a new wave of AI is reshaping the landscape. By integrating genomic, clinical and molecular data, multimodal models are revealing hidden patterns and accelerating more precise advancements in medicine.

Anonymous Medical Researcher Working in Modern Office on a Desktop Computer with Molecular Modelling Software and Source Code on Screen. Biologist Designing New Drug Molecules to Combat Bacteria

Drug development is plagued by complex challenges, but multimodal AI is unlocking new opportunities. By integrating diverse data sources – from genomics to clinical insights – this approach is accelerating drug discovery, improving patient stratification and boosting success rates. Read on to explore how multimodal AI is reshaping the future of medicine.

Drug development faces significant challenges: long timelines, high costs, complex processes and low probabilities of success (PoS), exacerbated by the shift towards more complex molecules, biologics and cell and gene therapies, hindering patient access to vital treatments. The increasing complexity of advanced therapies has reduced approval rates, but AI is opening new opportunities.

While AI has been utilised in drug discovery for some time, its impact has been limited by siloed datasets. However, the field is evolving towards a more integrated approach, combining large-scale genomic datasets into multimodal language models (MLMs). This shift is driven by open access policies to biomedical data and the advent of next-generation sequencing (NGS). NGS has revolutionised genomic analysis, enabling the identification of disease-related genetic variants. Clinical genomics, powered by NGS, enables more precise target validation, improved patient stratification and optimised trial design, ultimately aiming to increase PoS. By integrating diverse data sources and leveraging AI, the industry hopes to overcome the challenges of complex drug development and accelerate the delivery of effective treatments to patients.

Access your FREE copy

Biomarkers aren’t just supporting drug discovery – they’re driving it

FREE market report

From smarter trials to faster insights, this report unpacks the science, strategy and real-world impact behind the next generation of precision therapies.

What you’ll unlock:

How biomarkers are guiding dose selection and early efficacy decisions in complex trials
Why multi-omics, liquid biopsy and digital tools are redefining the discovery process
What makes lab data regulatory-ready and why alignment matters from day one

Explore how biomarkers are shaping early drug development

Access the full report – it’s free!

Multimodal language models

Generative AI models (GenAI) in the pharmaceutical field have reached the highest level of attention with the Nobel Prizes to Demis Hassabis and John Jumper for AlphaFold, which can predict protein structures. These models aim to identify compounds with the optimal balance of properties for safety and efficacy in achieving the therapeutic objective. Leveraging patient genomic data to train and refine both generative and predictive models enhances their ability to identify molecules that are not only effective but also safe and suitable for specific patient populations. This approach prioritises patient selection by designing drugs with a higher PoS. Ultimately, the success of drug development hinges significantly on identifying the right patient populations for specific therapies.

Many companies have data in silos, and they go through data in a very linear fashion, one modality at a time.

Many companies have data in silos, and they go through data in a very linear fashion, one modality at a time. Traditional drug discovery data architecture is manual, messy, proprietary and inflexible. Unimodality does not allow for mixing data: cell data, images, molecular data, clinical data records, small molecule descriptors, ADME Tox data, transcriptomic data, text-based drug and disease representations, clinical trial protocols, publications and patent data, etc. If it’s not mixable, the data value chain (from R&D phase to production phase) is neither interpretable or reproducible. Multimodality however can detect and connect trends (and in future generate content) across different modalities and therefore allows for better interpretability, which builds trust between regulators, researchers and industry stakeholders. A known obstacle of biomedical data, with its inherent heterogeneity and inconsistencies, is that it presents significant challenges for creating a unified, high-quality knowledge base to fuel large language models (LLMs).

Multimodal language models are advanced language models that can handle multiple types of input and generate multiple types of output. Each modality represents a different type of data, such as text, audio, images, video, and more. The most common MLMs are GPT-4o (powering the free ChatGPT tier), Gemini 1.5 flash (powering the free Google App tier) and Claude Sonnet 3.5. Internally, the models learn to associate concepts, find patterns and relate text and images (or other modalities) so that they can be analysed in the same way. This approach overcomes the limitations of traditional methods that analyse only single modalities of information. Multimodality can combine omics data with chemical and clinical features to identify therapeutic targets and predict clinical responses with greater accuracy, improving the reliability of drug candidates. For example, MLMs can simultaneously explore genetic sequences and clinical data to improve crucial characteristics such as efficacy, safety and bioavailability. A practical example is the use of MLMs to correlate genetic variants with clinical biomarkers, optimising the stratification of patients for clinical trials and improving the selection of candidates for clinical phases.

Integrate diverse data, unlock new cures: Multimodal AI in drug development. Highlighting data integration. This is an AI generated image.

This approach allows for the simultaneous integration and analysis of multiple data sources – such as genomic, chemical, clinical, structural and imaging information – to create a holistic view of the problem, overcoming the limitations of traditional methods that focus on single sources of information (eg, unimodality). Furthermore, algorithms can simultaneously refine multiple desired properties of a drug, such as efficacy, safety and bioavailability, a task that would be extremely complex and time-consuming if approached by conventional methods. For example, integrating omics data with specific chemical and clinical features can help identify more robust therapeutic targets and predict clinical responses with greater accuracy. This ability to correlate seemingly disparate data is crucial for tackling complex targets and discovering new treatments for diseases considered difficult to treat with conventional approaches.

The integrated approach to drug development allows the identification of candidate molecules that simultaneously satisfy a broad range of desired characteristics and to understand more completely the complex biological interactions and drug-target dynamics. This improves the quality and reliability of drug candidates and significantly increases PoS in the later stages of development. MLMs are examples of technologies that have enabled the analysis of textual, visual and structural data in an integrated way with the ability to rapidly explore chemical spaces by generating and evaluating millions of potential molecular structures. These models can, for example, simultaneously examine genetic sequences, images of protein structures and clinical data to suggest molecular candidates that satisfy multiple criteria, such as efficacy, safety and bioavailability. A practical example is the use of MLMs to identify correlations between genetic variants and clinical biomarkers, thus improving the stratification of patients for clinical trials. These capabilities far outperform traditional methods in terms of efficiency and speed, able to identify more elusive correlations and patterns, to understand biological mechanisms and drug-target interactions more accurately and comprehensively, and to improve the precision of predictions and the quality of identified drug candidates.

The importance of early AI integration in multidisciplinary teams

AI adoption in pharmaceutical research is often treated as an afterthought, rather than a central element from the outset. This approach limits the transformative potential of the technology. Integrating AI experts early in projects ensures a more effective process where multidisciplinary expertise contributes synergistically to designing optimal solutions. The lack of collaboration between biologists, chemists, engineers and data scientists represents a significant barrier. Compartmentalised teams struggle to fully leverage multimodality, leading to suboptimal solutions. A collaborative approach ensures more reliable AI tools, resulting in robust, explainable models with significantly fewer hallucinations. To tackle this challenge, companies must adopt strategies that promote interaction between disciplines, integrating computational skills with clinical and biological expertise. Only in this way can AI reach its full potential in accelerating drug discovery and improving trial success rates.

Multimodal AI: maximizing ROI in drug development

Multimodality, in particular, is a significant advance in the application of AI to drug discovery. This technology integrates various types of data – eg, genomic, clinical, chemical – creating a data-driven approach, more dynamic and efficient than traditional processes, typically linear and high-risk. The full potential of multimodality in drug discovery remains untapped due to a lack of multidisciplinary communication. Drug hunters focused on identifying promising targets and compounds, drug developers responsible for optimising and testing those compounds, and data scientists skilled in analysing complex datasets, each specialising in their distinct data domains, struggle to effectively collaborate and integrate their knowledge. This disconnect hinders the efficient translation of research findings into new therapies, slowing down the drug discovery process and potentially missing crucial insights hidden within the combined data.

About the authors

Remco Jan Geukes Foppen, PhD, is an AI and life sciences expert specialising in the pharmaceutical sector. With a global perspective, he integrates and implements AI-driven strategies that impact business decisions, always considering the human element. His leadership has driven international commercial success in areas including image analysis, data management, bioinformatics, advanced clinical trial data analysis leveraging machine learning and federated learning. Remco Jan Geukes Foppen’s academic background includes a PhD in biology and a master’s degree in chemistry, both from the University of Amsterdam.

Connect on LinkedIn >>

Vincenzo Gioia is an AI innovation strategist and a business and technology executive, with a 20-year focus on quality and precision for the commercialisation of innovative tools. Vincenzo specialises in artificial intelligence applied to image analysis, business intelligence and excellence. His focus on the human element of technology applications has led to high rates of solution implementation. He holds a master’s degree from University of Salerno in political sciences and marketing.

Connect on LinkedIn >>

Alessio Zoccoli applies AI for a sustainable future. His deep understanding of industry applications and technical expertise drives innovation in AI-powered solutions for complex business challenges. He specialises in cutting-edge advancements in natural language processing, computer vision, and generative AI. He is a senior data scientist and holds a master’s degree from Roma Tre University in computer engineering, where he also held the role of research fellow.

Connect on LinkedIn >>

Carlos N Velez, PhD, MBA, is a pharmaceutical and biotechnology strategic advisor, with 25 years’ experience in consulting, venture capital, corporate strategy and entrepreneurship. Carlos specialises in helping pharmaceutical and biotechnology companies develop their in- and out-licensing strategies, with additional expertise and experience in portfolio assessment and prioritisation, drug candidate valuation, valuation and related services. He also develops and presents customised training programmes (both live and virtual) for companies seeking to improve their in- and out-licensing processes. He holds a PhD in pharmacy from the University of North Carolina at Chapel Hill, and an MBA from the Rochester Institute of Technology.

Connect on LinkedIn >>

Literature

Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. Using clinical genomics and AI in drug development to elevate success. (2025). Drug Target Review February Edition
https://www.drugtargetreview.com/article/155906/clinical-genomics-ai-drug-success/
Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. Early evidence and emerging trends: How AI is shaping drug discovery and clinical development. (2025). Drug Target Review April Edition
https://www.drugtargetreview.com/article/158593/early-evidence-and-emerging-trends-how-ai-is-shaping-drug-discovery-and-clinical-development/
Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. The Rise of Multimodal Language Models in Drug Development (2025). EPR Pharma Horizons Report: The impact of Artificial Intelligence. Page 13. Available from:
https://www.europeanpharmaceuticalreview.com/article/251105/pharma-horizons-artificial-intelligence-2/
Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. Navigating the AI revolution: a roadmap for pharma’s future. (2025). Drug Target Review March Edition
https://www.drugtargetreview.com/article/157270/navigating-the-ai-revolution-a-roadmap-for-pharmas-future/
Geukes Foppen RJ, Gioia V, Gupta S, et al. Methodology for Safe and Secure AI in Diabetes Management in Journal of Diabetes Science and Technology [Internet]. 2024 Dec 26; Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC11672366/
Geukes Foppen RJ, Gioia V, Velez CN. AI, PoS, and ROI: An alphabet soup of 21st Century drug development PART 2. Lifescienceleader.com. 2024. Available from:
https://www.lifescienceleader.com/doc/ai-pos-and-roi-an-alphabet-soup-of-st-century-drug-development-0002
Geukes Foppen RJ, Gioia V, Velez CN. AI, PoS, and ROI: An alphabet soup of 21st Century drug development PART 1 Lifescienceleader.com. 2024. Available from:
https://www.lifescienceleader.com/doc/ai-pos-and-roi-an-alphabet-soup-of-st-century-drug-development-0001
Gioia V, Geukes Foppen RJ. ‘Explainambiguity:’ When What You Think Is Not What You Get. [Internet]. Lifescienceleader.com. 2024. Available from:
https://www.lifescienceleader.com/doc/explainambiguity-when-what-you-think-is-not-what-you-get-0001
Gioia V, Geukes Foppen RJ. Correct But Misleading: AI Hallucinations In Complex Decision-Making. [Internet]. Lifescienceleader.com. 2024. Available from:
https://www.lifescienceleader.com/doc/correct-but-misleading-ai-hallucinations-in-complex-decision-making-0001
Tunyasuvunakool K, Adler J, Wu Z, et al. Highly accurate protein structure prediction for the human proteome.
Nature, 596, 590-596 (2021) https://www.nature.com/articles/s41586-021-03828-1
Liu R, Rizzo S, Waliany S, et al. Systematic pan-cancer analysis of mutation–treatment interactions using large real-world clinicogenomics data. Nature Medicine, 28, 1656–1661 (2022) https://www.nature.com/articles/s41591-022-01873-5

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

From siloed data to breakthroughs: multimodal AI in drug discovery

Biomarkers aren’t just supporting drug discovery – they’re driving it

Multimodal language models

The importance of early AI integration in multidisciplinary teams

Multimodal AI: maximizing ROI in drug development

Literature

Leave a Reply Cancel reply

Recommended

From siloed data to breakthroughs: multimodal AI in drug discovery

Biomarkers aren’t just supporting drug discovery – they’re driving it

Multimodal language models

The importance of early AI integration in multidisciplinary teams

Multimodal AI: maximizing ROI in drug development

Literature

Pain Research: WRPRFa as a Novel Tool for Acid-Sensing Ion Channel 3 (ASIC3)

New gene therapy restores brain function in SYNGAP1 disorder

Eye movements as objective biomarkers: accelerating CNS drug development

How smoking and alcohol shape mutations in our DNA

HIV antibody 04_A06 almost neutralises all strains

Leave a Reply Cancel reply