Despite rapid advances in AI, many drug discovery models still struggle to translate computational predictions into clinical outcomes. Thomas Clozel explains how Owkin is training AI on large-scale patient-derived data while integrating experimental and clinical validation directly into model development.

Artificial intelligence is now used across multiple stages of pharmaceutical R&D, including target identification, biomarker discovery, molecule design and clinical trial analysis. However, despite advances in generative AI and predictive modelling, relatively few systems have demonstrated clear clinical impact.
One of the major challenges is that many AI models are still trained primarily on laboratory datasets or controlled experimental systems, which may not fully reflect the biological complexity seen in patients. As a result, predictions that perform well in computational or preclinical settings do not always translate successfully into clinical outcomes.
Researchers are now exploring whether training AI systems on large-scale patient-derived data, combined with experimental and clinical validation, could improve how these models identify disease mechanisms, stratify patients and support drug development decisions.
Dr Thomas Clozel, Co-founder and CEO of Owkin, believes patient-derived data will be essential for improving the biological relevance of drug discovery models. Over the past decade, Owkin has built multimodal patient datasets collected from a network of more than 800 hospitals, which the company uses across research and clinical applications.
Building on this work, Owkin earlier this year announced a new agentic AI Scientist trained on patient-derived data, alongside partnerships with NVIDIA and Anthropic focused on integrating pathology analysis tools into healthcare AI workflows.
The limits of current AI models
Clozel believes many AI systems used in drug discovery perform well on curated datasets but struggle when applied to real-world clinical settings.
“So far none of these AI models have allowed us to cure cancer – so bluntly, we’ve failed,” said Clozel.
So far none of these AI models have allowed us to cure cancer – so bluntly, we’ve failed.
He explained that many existing systems still lack sufficient understanding of the underlying biology of disease, limiting their ability to generalise across patient heterogeneity, evolving clinical practice and multimodal clinical data.
This can create situations where AI systems generate promising predictions or novel molecules without adequately accounting for target validity, translational relevance or patient stratification.
AI needs a reality check, or it risks spinning out more and more predictions that sound good, but fall apart on contact with the real world.
“AI needs a reality check, or it risks spinning out more and more predictions that sound good, but fall apart on contact with the real world,” he added.
To address this, Owkin is integrating experimental and clinical validation directly into model development. Clozel said the company tests predictions using patient-derived cells and organoids, while findings from clinical studies, including Owkin’s INVOKE oncology trial, are also fed back into model training workflows.
Why patient-derived data matters
A major part of Owkin’s strategy involves training models on real-world patient data rather than relying primarily on laboratory or brokered datasets.
Traditional drug discovery workflows often begin with in vitro systems and experimental models designed to simulate aspects of human disease. However, these systems frequently fail to capture the biological complexity observed in patients.
Real-world patient data can instead provide information on disease heterogeneity, co-morbidities, prior treatments and biological variation across multiple scales, including tissue morphology, molecular profiles and clinical outcomes.
According to Clozel, this level of biological fidelity may help researchers identify more meaningful biomarkers, define clinically relevant patient subgroups and improve clinical trial design.
From predictive software to agentic systems
Owkin is also developing what it describes as an agentic AI Scientist designed to support multi-step scientific workflows rather than simply generate isolated predictions.
Clozel described this transition as moving from “software that predicts” towards “systems that act”.
This AI Scientist can retrieve multimodal datasets, perform analyses, test assumptions and generate outputs designed for use within research and clinical development workflows. Researchers may also be able to analyse patient subgroups, interrogate raw datasets and generate visualisations without requiring multiple separate analytical steps or external bioinformatics support.
As part of the AI Scientist, Owkin is also developing specialised interoperable AI tools , including pathology analysis systems capable of identifying cellular features directly from pathology slide images.
Clozel said the goal is to reduce interruptions within research workflows and allow scientists to iteratively explore hypotheses while interacting directly with underlying biological data.
Extracting biological information from pathology slides
One area of focus for Owkin involves extracting additional biological information from routine pathology slides.
Clozel noted that standard haematoxylin and eosin (H&E) slides contain large amounts of underused biological information relating to tumour microenvironments, spatial organisation and cellular composition.
However, large-scale pathology analysis has historically been limited by cost, time and analytical complexity. Clozel said Owkin’s Pathology Explorer AI tool is designed to automate parts of this process, helping researchers analyse pathology images more efficiently.
Testing biological reasoning in AI systems
Clozel said Owkin’s long-term goal is the development of what the company describes as Biological Artificial Super Intelligence (BASI), referring to AI systems designed to reason across biological systems rather than simply generate predictions from datasets.
For Clozel, one of the most important milestones will be demonstrating that AI-generated predictions can be validated experimentally and clinically.
This includes testing model outputs using patient-derived organoids and evaluating whether AI-informed approaches improve clinical trial design or patient response prediction.
We want to build an AI scientist that can be proactive in its exploration and development of hypotheses.
Owkin is also working towards what Clozel described as an “autonomous AI scientist”, capable of independently generating hypotheses, testing predictions and iteratively refining its understanding of biological systems.
“We want to build an AI scientist that can be proactive in its exploration and development of hypotheses,” said Clozel.
Ultimately, Clozel believes the strongest evidence for these systems will come when they can generate novel biological hypotheses that prove experimentally and clinically valid.
“The greatest milestone, though, will be when we have created a system that can bring you a biological hypothesis… that is surprising, new and turns out to be true,” he concluded.










No comments yet