How real-world data is accelerating drug discovery
Posted: 20 October 2025 | Vish Srivastava (Co-Founder & CEO of Century Health) | No comments yet
Vish Srivastava considers the benefits of expanding the role of real-world data in drug discovery to provide improved therapies, faster and with greater success.


The path from identifying a promising biological target to delivering a safe, effective therapy is long, complex and costly. Despite decades of scientific progress and major investment, the attrition rate in early-stage drug development remains high, with many candidates failing before they reach Phase I trials. Often, this is because the targets chosen in the lab do not fully reflect the complexity of disease in the real world, where patients present with diverse phenotypes, comorbidities and treatment histories that rarely match the narrow inclusion criteria of clinical studies.
Historically, real-world data (RWD) – clinical information collected outside of controlled trials, such as electronic health records (EHRs), imaging and laboratory results – has been used primarily in the later stages of the pipeline. Post-market safety monitoring, health economics and outcomes research, and reimbursement strategy have been its main domains.1 However, recent advances in AI, data analytics and natural language processing are changing that. Today, disease-specific, longitudinal registries enriched with deep clinical features are beginning to influence target discovery, validation and translational research, giving scientists a more representative view of disease biology long before the first patient is enrolled in a trial. These registries are designed to be robust enough for early discovery and translational use.
This shift matters because the biology observed in tightly controlled trial settings can differ significantly from that seen in daily clinical practice. In respiratory diseases, such as chronic obstructive pulmonary disease (COPD) and asthma, for example, patient trajectories can vary widely depending on environmental exposures, adherence patterns, comorbidities and socioeconomic factors.2 Laboratory and animal models cannot fully capture this diversity, and neither can the narrow patient populations in Phase II or III studies. As a result, a target that appears promising under ideal conditions may falter when exposed to the heterogeneity of the real world.
Biomarkers are redefining how precision therapies are discovered, validated and delivered.
This exclusive expert-led report reveals how leading teams are using biomarker science to drive faster insights, cleaner data and more targeted treatments – from discovery to diagnostics.
Inside the report:
- How leading organisations are reshaping strategy with biomarker-led approaches
- Better tools for real-time decision-making – turning complex data into faster insights
- Global standardisation and assay sensitivity – what it takes to scale across networks
Discover how biomarker science is addressing the biggest hurdles in drug discovery, translational research and precision medicine – access your free copy today
Researchers can identify subpopulations with unique disease trajectories, refine biomarker strategies and even detect unexpected treatment effects that could point to novel mechanisms of action.
Integrating RWD earlier in the pipeline can help close this gap. By examining rich clinical datasets during the preclinical and translational phases, researchers can identify subpopulations with unique disease trajectories, refine biomarker strategies and even detect unexpected treatment effects that could point to novel mechanisms of action. In asthma, for example, correlating eosinophil counts with exacerbation frequency and response to biologics across thousands of patients can guide both target validation and patient selection strategies.3 Such correlations are rarely visible in the literature or trial datasets alone.
The key is not just having access to RWD, but having fit-for-purpose, disease-specific registries that offer both depth and breadth. These differ from large but low-resolution datasets, such as claims, in several ways. A COPD registry designed for research might include spirometry results over multiple years, detailed records of exacerbations and hospitalisations, radiology reports describing emphysema distribution, and longitudinal medication histories, including switches, discontinuations and off-label use. When harmonised across multiple care sites and combined with structured and unstructured EHR data, such registries can provide a multidimensional view of disease progression.4 This level of granularity is where emerging data companies are working to close critical gaps for researchers and pharmaceutical companies.
Natural history data from untreated or differently managed cohorts can reveal the true course of disease, helping scientists assess whether modulating a given target is likely to have meaningful impact.
These insights are particularly valuable for target validation. In addition to linking biomarkers to outcomes, RWD can illuminate real-world signals that strengthen or challenge a mechanistic hypothesis. Patterns of off-label prescribing that lead to measurable patient benefit can hint at relevant biological pathways. Natural history data from untreated or differently managed cohorts can reveal the true course of disease, helping scientists assess whether modulating a given target is likely to have meaningful impact. Clustering of comorbidities, such as the overlap of COPD with cardiovascular disease, may point to shared molecular mechanisms that are worth exploring.5
RWD can also uncover unmet needs invisible in controlled studies. Registry analyses in asthma have shown that a substantial proportion of patients with frequent exacerbations remain untreated with biologics, despite being eligible under clinical guidelines.6 Identifying and quantifying such therapeutic gaps helps R&D teams prioritise areas where new interventions could have the greatest benefit and where patient recruitment for trials is likely to be both feasible and impactful.
The benefits extend beyond preclinical work into the design of early clinical studies. Registry-derived insights can inform inclusion and exclusion criteria that reflect the populations most likely to respond, rather than the most convenient to recruit. They can help refine endpoints so that they capture outcomes meaningful to both patients and regulators. In COPD, for instance, analysing longitudinal lung function decline in various subgroups can guide endpoint selection and trial duration to maximise the likelihood of detecting a treatment effect.7
Today, AI-powered abstraction tools can parse unstructured notes at scale, harmonise data across institutions and preserve patient privacy through de-identification and secure linkage.
Until recently, using RWD in this way was technically and operationally challenging. Clinical notes, imaging reports and lab data were siloed in disparate systems and formats, and extracting meaningful variables often required months of manual chart review. Today, AI-powered abstraction tools can parse unstructured notes at scale, harmonise data across institutions and preserve patient privacy through de-identification and secure linkage. Organisations are now applying these technologies to deliver curated, analysis-ready datasets in weeks, fast enough to inform active preclinical and translational programmes.8
The opportunity is clear. If RWD is integrated from the earliest stages of the pipeline, it can help de-risk development, accelerate timelines and improve the chances of clinical and commercial success. Disease-specific registries, built with both scientific and clinical questions in mind, can serve as a shared evidence base for discovery teams, translational scientists and clinical developers.
Drug discovery will always require the controlled precision of the lab and the rigor of clinical trials; but in an era where rich, representative, longitudinal patient data is within reach, relying solely on experimental models and published studies leaves valuable insights untapped. By bringing the real world into the earliest stages of R&D, we can better align scientific hypotheses with patient realities and bring more effective therapies to those who need them.
References
- Sherman RE, Anderson SA, Dal Pan GJ, et al. Real-world evidence – what is it and what can it tell us? N Engl J Med. 2016;375(23):2293–2297.
- Barnes PJ. Inflammatory mechanisms in patients with chronic obstructive pulmonary disease. J Allergy Clin Immunol. 2016;138(1):16–27.
- Bleecker ER, FitzGerald JM, Chanez P, et al. Eosinophilic asthma: clinical characteristics and response to therapy. Eur Respir J. 2015;46(3):823–835.
- Corrigan-Curay J, Sacks L, Woodcock J. Real-world evidence and real-world data for evaluating drug safety and effectiveness. JAMA. 2018;320(9):867–868.
- Mannino DM, Thorn D, Swensen A, Holguin F. Prevalence and outcomes of diabetes, hypertension and cardiovascular disease in COPD. Eur Respir J. 2008;32(4):962–969.
- Price D, Brusselle G, Roche N, et al. Real-world management of asthma: a review of current evidence. J Asthma Allergy. 2015;8:47–56.
- Celli BR, Locantore N, Yates J, et al. Inflammatory biomarkers improve clinical prediction of mortality in chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2012;185(10):1065–1072.
- Wang SV, Schneeweiss S, Berger ML, et al. Reporting to improve reproducibility and facilitate validity assessment for healthcare database studies. Value Health. 2017;20(8):1009–1022.
Meet the author
Related topics
Analysis, Artificial Intelligence, Big Data, Biomarkers, Clinical Trials, Computational techniques, Drug Discovery, Drug Discovery Processes, Machine learning
Related conditions
Asthma, Cardiovascular disease, Chronic obstructive pulmonary disease (COPD), Diabetes
Related organisations
Century Health
Related people
Vish Srivastava (Co-Founder & CEO of Century Health)