Making sense of AI: bias, trust and transparency in pharma R&D
Posted: 11 September 2025 | Alessio Zoccoli, Marcella Zucca, Remco Jan Geukes Foppen, Vincenzo Gioia | No comments yet
AI is increasingly used in drug discovery, but hidden bias and ‘black box’ models threaten trust and transparency. This article explores how explainable AI can turn opaque predictions into clear, accountable insights.


The integration of artificial intelligence (AI) into drug discovery has revolutionised R&D, dramatically accelerating the identification of new drug targets and the prediction of compound efficacy. AI models exhibit tremendous predictive capabilities, but their complexity also creates a significant challenge: the ‘black box’ problem. These state-of-the-art AI models often produce outputs without revealing the reasoning behind their decisions, making it difficult for researchers to understand or verify their predictions. This opacity is a critical barrier in drug discovery, where knowing why a model makes a certain prediction is as important as the prediction itself.
The pursuit of explainable AI (xAI) starts not with algorithms but with an acknowledgment of ambiguity – ie, the uncertainty and complexity of effective AI implementation that is inherent to AI outputs. Rather than viewing ambiguity as a deficiency, researchers are compelled to develop techniques that ‘fill in the gaps’ of understanding, thereby improving trustworthiness and scientific insight. The goal of xAI is to foster better decision-making and innovative solutions in drug discovery.
This conceptual shift has moved the field from black-box AI towards more interpretable models. Researchers are developing xAI tools that enable greater transparency, such as counterfactual explanations. These enable scientists to ask ‘what if’ questions, such as ‘how would the model’s prediction change if certain molecular features or protein domains were different?’ This way, biological insights can be extracted directly from AI models, helping to refine drug design, predict off-target effects and reduce risks in drug development pipelines.
Biomarkers are redefining how precision therapies are discovered, validated and delivered.
This exclusive expert-led report reveals how leading teams are using biomarker science to drive faster insights, cleaner data and more targeted treatments – from discovery to diagnostics.
Inside the report:
- How leading organisations are reshaping strategy with biomarker-led approaches
- Better tools for real-time decision-making – turning complex data into faster insights
- Global standardisation and assay sensitivity – what it takes to scale across networks
Discover how biomarker science is addressing the biggest hurdles in drug discovery, translational research and precision medicine – access your free copy today
Regulatory landscape: the EU AI Act and explainable AI
Different countries and regions are taking varied approaches to AI regulation, but on 02 August 2025, a significant phase of the EU AI Act came into force. This section of the Act primarily focuses on governance and general-purpose AI (GPAI) models, marking a major step in its implementation. A core principle of the Act is its classification of certain AI systems, especially those in healthcare and drug development, as “high-risk.” This is critical since it mandates strict requirements for transparency and accountability. High-risk systems must be “sufficiently transparent” so that users can correctly interpret their outputs.
They cannot simply trust a black-box algorithm without a clear rationale.
This is where xAI becomes essential. In a medical context, healthcare professionals must understand the reasoning behind an AI’s recommendations for diagnosis or treatment. They cannot simply trust a black-box algorithm without a clear rationale. Furthermore, providers of GPAI models – such as large language models (LLMs) – are also now subject to new rules. For healthcare AI, this directly relates to xAI by requiring transparency about training data, algorithm methodology and the factors that influence its results. It is important to note that the Act also includes exemptions. As stated in a legal analysis by the European Federation of Pharmaceutical Industries and Associations (EFPIA), AI systems used “for the sole purpose of scientific research and development” are generally excluded from the Act’s scope. This means that many AI-enabled drug discovery tools used in early-stage research may not be classified as high-risk, as they are not directly used in the final clinical management of patients. AI applications in the Act are regulated based on their specific use case and risk level, with a key distinction made between research tools and clinical or diagnostic systems. Nonetheless, transparency is key to enabling human oversight and identifying potential biases within the system.
Addressing bias in datasets
Bias in datasets is a profound challenge in AI-driven drug discovery and healthcare. AI models depend heavily on the quality and diversity of their training data, as well as how representative they are. When datasets are biased – whether through underrepresentation of certain demographic groups or the fragmentation of data across silos – AI predictions become skewed. This can lead to unfair or inaccurate outcomes, perpetuating healthcare disparities and undermining patient stratification.
For example, if clinical or genomic datasets insufficiently represent women or minority populations, AI models may poorly estimate drug efficacy or safety in these groups. In drug discovery, where complex biological data are combined, such biases can lead to drugs that do not perform well universally or fail to reveal critical safety concerns. Furthermore, data silos restrict the training inputs, exacerbating these limitations and sometimes causing AI ‘hallucinations’ – incorrect or misleading outputs arising from biased or inconsistent data.


This is an AI-generated image.
Explainability as a solution to bias
xAI emerges as a promising strategy to uncover and mitigate dataset biases. By increasing transparency into model decision-making, xAI highlights which features most influence predictions and reveals when bias may be corrupting results. In drug discovery, xAI empowers researchers to dissect the biological and clinical signals that drive predictions, enabling targeted interventions such as rebalancing datasets or refining algorithms to ensure fairness. Moreover, xAI enables stakeholders – including researchers, clinicians and policymakers – to audit AI systems, identify gaps in data coverage, and adjust both data collection strategies and model design. Techniques like preprocessing to balance training samples, integrating multiple complementary datasets and continuous monitoring with xAI frameworks assist in improving fairness and generalisability. This ensures AI models deliver equitable healthcare insights across diverse patient groups.
The reproduction of systemic bias by AI
The rise of generative AI and LLMs in healthcare has magnified the challenge of bias. These models learn from massive but inherently imperfect datasets and are neither aware of nor able to correct biases independently. Instead, they replicate and sometimes amplify these biases in their recommendations or discoveries.
The rise of generative AI and LLMs in healthcare has magnified the challenge of bias.
In pharmaceutical applications, LLMs used for molecule generation, drug interaction predictions, or clinical trial simulations risk producing outputs less effective for underrepresented groups if the datasets lack appropriate biological and demographic diversity. Even when AI-generated conclusions appear accurate, they may be misleading or biased, posing ethical and clinical challenges. This shifts the problem from mere data representation to concerns about equitable access to personalised medicine driven by AI.
Industry responsibility and mitigation strategies
Technology vendors, pharmaceutical companies and data providers play a pivotal role in tackling dataset bias. They must commit to inclusive data practices, ensure fairness in data collection, educate teams about diversity and bias, and implement ongoing algorithmic audits. Advanced approaches like ‘data augmentation,’ where datasets are enriched or synthetically balanced to improve representation, provide additional means to address imbalance. For example, carefully generated synthetic data can mimic underrepresented biological scenarios, helping to reduce bias during model training without compromising patient privacy.
Consequences of the gender data gap
Inequalities in datasets are more than a technical flaw; they represent a structural problem with significant implications for treatment and healthcare outcomes. A prominent example is the gender data gap in life sciences AI: women remain underrepresented in many training datasets. This creates AI systems that work better for men than women, jeopardising the promise of personalised medicine.
For instance, drugs developed with predominantly male data may have inappropriate dosage recommendations for women, resulting in higher adverse reaction rates among females.
Studies in oncology also reveal sex-based differences in treatment responses, emphasising the need for sex-disaggregated data in both research and AI modelling. Without addressing these gaps, AI risks perpetuating existing healthcare disparities.
Explainable AI’s role in overcoming bias
AI-assisted drug discovery offers the prospect of identifying and correcting these biases more effectively than traditional methods. xAI provides transparency into how predictions are made, enabling detection when models disproportionately favour one sex or demographic. This enables focused strategies – like targeted data augmentation or model retraining – to enhance generalisability.
Ultimately, dataset bias undermines AI’s potential to transform healthcare by skewing outcomes and excluding populations. xAI is the critical pathway to exposing, understanding and addressing these biases. Making AI’s decision-making process transparent fosters the development of safer, fairer and more trustworthy AI systems. These advances will be essential in realising equitable, precise and effective AI applications in drug discovery and personalised medicine. Now semantic-level xAI represents a pivotal step towards building AI systems that can reason and communicate in a manner that is both understandable and verifiable by human experts, fostering the trust and regulatory compliance necessary for their widespread adoption in the pharmaceutical industry.
Literature
- Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. From siloed data to breakthroughs: multimodal AI in drug discovery. (2025). June 11. Drug Target Review https://www.drugtargetreview.com/article/160597/from-siloed-data-to-breakthroughs-multimodal-ai-in-drug-discovery/
- Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. Early evidence and emerging trends: How AI is shaping drug discovery and clinical development. (2025). Drug Target Review April Edition https://www.drugtargetreview.com/article/158593/early-evidence-and-emerging-trends-how-ai-is-shaping-drug-discovery-and-clinical-development/
- EU AI ACT https://artificialintelligenceact.eu/
- Pharma AI readiness: How the 50 largest companies by market cap stack up (2025). CB insights. https://www.cbinsights.com/research/ai-readiness-index-pharma-2025/
- Wagner AD, Oertelt-Prigione S, Adjei A, et al. (2024). Gender medicine and oncology: report and consensus of an ESMO workshop. Annals of Oncology. 10.1093/annonc/mdz414
- Geukes Foppen RJ, Gioia V, Gupta S, et al. Methodology for Safe and Secure AI in Diabetes Management in Journal of Diabetes Science and Technology [Internet]. 2025; Available from: https://doi.org/10.1177/19322968241304434
About the authors
Related topics
Artificial Intelligence, Big Data, Bioinformatics, Computational techniques, Drug Development, Drug Discovery, Drug Discovery Processes, Legal & Compliance, Machine learning, Personalised Medicine, Research & Development
Related conditions
Cancer
Related people
Alessio Zoccoli, Marcella Zucca, Remco Jan Geukes Foppen, Vincenzo Gioia