article

Making sense of AI: bias, trust and transparency in pharma R&D

AI is increasingly used in drug discovery, but hidden bias and ‘black box’ models threaten trust and transparency. This article explores how explainable AI can turn opaque predictions into clear, accountable insights.

Digital illustration of a glowing capsule pill with circuit lines, symbolising artificial intelligence in drug discovery and pharmaceutical innovation

The integration of artificial intelligence (AI) into drug discovery has revolutionised R&D, dramatically accelerating the identification of new drug targets and the prediction of compound efficacy. AI models exhibit tremendous predictive capabilities, but their complexity also creates a significant challenge: the ‘black box’ problem. These state-of-the-art AI models often produce outputs without revealing the reasoning behind their decisions, making it difficult for researchers to understand or verify their predictions. This opacity is a critical barrier in drug discovery, where knowing why a model makes a certain prediction is as important as the prediction itself.

The pursuit of explainable AI (xAI) starts not with algorithms but with an acknowledgment of ambiguity – ie, the uncertainty and complexity of effective AI implementation that is inherent to AI outputs. Rather than viewing ambiguity as a deficiency, researchers are compelled to develop techniques that ‘fill in the gaps’ of understanding, thereby improving trustworthiness and scientific insight. The goal of xAI is to foster better decision-making and innovative solutions in drug discovery.

This conceptual shift has moved the field from black-box AI towards more interpretable models. Researchers are developing xAI tools that enable greater transparency, such as counterfactual explanations. These enable scientists to ask ‘what if’ questions, such as ‘how would the model’s prediction change if certain molecular features or protein domains were different?’ This way, biological insights can be extracted directly from AI models, helping to refine drug design, predict off-target effects and reduce risks in drug development pipelines.

 

access your free copy

 


Biomarkers are redefining how precision therapies are discovered, validated and delivered. 

This exclusive expert-led report reveals how leading teams are using biomarker science to drive faster insights, cleaner data and more targeted treatments – from discovery to diagnostics.

Inside the report:

  • How leading organisations are reshaping strategy with biomarker-led approaches
  • Better tools for real-time decision-making – turning complex data into faster insights
  • Global standardisation and assay sensitivity – what it takes to scale across networks

Discover how biomarker science is addressing the biggest hurdles in drug discovery, translational research and precision medicine – access your free copy today

 

Regulatory landscape: the EU AI Act and explainable AI

Different countries and regions are taking varied approaches to AI regulation, but on 02 August 2025, a significant phase of the EU AI Act came into force. This section of the Act primarily focuses on governance and general-purpose AI (GPAI) models, marking a major step in its implementation. A core principle of the Act is its classification of certain AI systems, especially those in healthcare and drug development, as “high-risk.” This is critical since it mandates strict requirements for transparency and accountability. High-risk systems must be “sufficiently transparent” so that users can correctly interpret their outputs.

They cannot simply trust a black-box algorithm without a clear rationale.

This is where xAI becomes essential. In a medical context, healthcare professionals must understand the reasoning behind an AI’s recommendations for diagnosis or treatment. They cannot simply trust a black-box algorithm without a clear rationale. Furthermore, providers of GPAI models – such as large language models (LLMs) – are also now subject to new rules. For healthcare AI, this directly relates to xAI by requiring transparency about training data, algorithm methodology and the factors that influence its results. It is important to note that the Act also includes exemptions. As stated in a legal analysis by the European Federation of Pharmaceutical Industries and Associations (EFPIA), AI systems used “for the sole purpose of scientific research and development” are generally excluded from the Act’s scope. This means that many AI-enabled drug discovery tools used in early-stage research may not be classified as high-risk, as they are not directly used in the final clinical management of patients. AI applications in the Act are regulated based on their specific use case and risk level, with a key distinction made between research tools and clinical or diagnostic systems. Nonetheless, transparency is key to enabling human oversight and identifying potential biases within the system.

Addressing bias in datasets

Bias in datasets is a profound challenge in AI-driven drug discovery and healthcare. AI models depend heavily on the quality and diversity of their training data, as well as how representative they are. When datasets are biased – whether through underrepresentation of certain demographic groups or the fragmentation of data across silos – AI predictions become skewed. This can lead to unfair or inaccurate outcomes, perpetuating healthcare disparities and undermining patient stratification.

For example, if clinical or genomic datasets insufficiently represent women or minority populations, AI models may poorly estimate drug efficacy or safety in these groups. In drug discovery, where complex biological data are combined, such biases can lead to drugs that do not perform well universally or fail to reveal critical safety concerns. Furthermore, data silos restrict the training inputs, exacerbating these limitations and sometimes causing AI ‘hallucinations’ – incorrect or misleading outputs arising from biased or inconsistent data. 

This is an AI-generated image.

This is an AI-generated image.

Explainability as a solution to bias

xAI emerges as a promising strategy to uncover and mitigate dataset biases. By increasing transparency into model decision-making, xAI highlights which features most influence predictions and reveals when bias may be corrupting results. In drug discovery, xAI empowers researchers to dissect the biological and clinical signals that drive predictions, enabling targeted interventions such as rebalancing datasets or refining algorithms to ensure fairness. Moreover, xAI enables stakeholders – including researchers, clinicians and policymakers – to audit AI systems, identify gaps in data coverage, and adjust both data collection strategies and model design. Techniques like preprocessing to balance training samples, integrating multiple complementary datasets and continuous monitoring with xAI frameworks assist in improving fairness and generalisability. This ensures AI models deliver equitable healthcare insights across diverse patient groups.

The reproduction of systemic bias by AI

The rise of generative AI and LLMs in healthcare has magnified the challenge of bias. These models learn from massive but inherently imperfect datasets and are neither aware of nor able to correct biases independently. Instead, they replicate and sometimes amplify these biases in their recommendations or discoveries.

The rise of generative AI and LLMs in healthcare has magnified the challenge of bias.

In pharmaceutical applications, LLMs used for molecule generation, drug interaction predictions, or clinical trial simulations risk producing outputs less effective for underrepresented groups if the datasets lack appropriate biological and demographic diversity. Even when AI-generated conclusions appear accurate, they may be misleading or biased, posing ethical and clinical challenges. This shifts the problem from mere data representation to concerns about equitable access to personalised medicine driven by AI.

Industry responsibility and mitigation strategies

Technology vendors, pharmaceutical companies and data providers play a pivotal role in tackling dataset bias. They must commit to inclusive data practices, ensure fairness in data collection, educate teams about diversity and bias, and implement ongoing algorithmic audits. Advanced approaches like ‘data augmentation,’ where datasets are enriched or synthetically balanced to improve representation, provide additional means to address imbalance. For example, carefully generated synthetic data can mimic underrepresented biological scenarios, helping to reduce bias during model training without compromising patient privacy.

Consequences of the gender data gap

Inequalities in datasets are more than a technical flaw; they represent a structural problem with significant implications for treatment and healthcare outcomes. A prominent example is the gender data gap in life sciences AI: women remain underrepresented in many training datasets. This creates AI systems that work better for men than women, jeopardising the promise of personalised medicine.

For instance, drugs developed with predominantly male data may have inappropriate dosage recommendations for women, resulting in higher adverse reaction rates among females.

Studies in oncology also reveal sex-based differences in treatment responses, emphasising the need for sex-disaggregated data in both research and AI modelling. Without addressing these gaps, AI risks perpetuating existing healthcare disparities.

Explainable AI’s role in overcoming bias

AI-assisted drug discovery offers the prospect of identifying and correcting these biases more effectively than traditional methods. xAI provides transparency into how predictions are made, enabling detection when models disproportionately favour one sex or demographic. This enables focused strategies – like targeted data augmentation or model retraining – to enhance generalisability.

Ultimately, dataset bias undermines AI’s potential to transform healthcare by skewing outcomes and excluding populations. xAI is the critical pathway to exposing, understanding and addressing these biases. Making AI’s decision-making process transparent fosters the development of safer, fairer and more trustworthy AI systems. These advances will be essential in realising equitable, precise and effective AI applications in drug discovery and personalised medicine. Now semantic-level xAI represents a pivotal step towards building AI systems that can reason and communicate in a manner that is both understandable and verifiable by human experts, fostering the trust and regulatory compliance necessary for their widespread adoption in the pharmaceutical industry.

Literature

 

About the authors

Marcella Zucca is Head of Generative AI and Sustainability at Capgemini Italy. She serves on the Program Advisory Committee of Bologna Business School for AI-focused executive masters. With a background in corporate performance management, she combines technical innovation with ESG strategy. She holds a master’s from LUISS in administration and controlling, and a master’s in sustainability and reporting from Tor Vergata. Her work promotes responsible AI adoption, balancing innovation with ethics, legal frameworks and business transformation.

Connect on LinkedIn >>

Vincenzo GioiaVincenzo Gioia is AI innovation strategist and founder of Explainambiguity. He is a business and technology executive, with a 20-year focus on quality and precision for the commercialisation of innovative tools. Vincenzo specialises in artificial intelligence applied to image analysis, business intelligence and excellence. His focus on the human element of technology applications has led to high rates of solution implementation. He holds a master’s degree from the University of Salerno in political sciences and marketing.

Connect on LinkedIn >>

 Alessio Zoccoli Alessio Zoccoli applies AI for a sustainable future. His deep understanding of industry applications and technical expertise drives innovation in AI-powered solutions for complex business challenges. He specialises in cutting-edge advancements in natural language processing, computer vision, and generative AI. He is a senior data scientist and holds a master’s degree from Roma Tre University in computer engineering, where he also held the role of research fellow.

Connect on LinkedIn >>

Remco FoppenRemco Jan Geukes Foppen, PhD, is an AI and life sciences expert specialising in the pharmaceutical sector and founder of Explainambiguity. With a global perspective, he integrates and implements AI-driven strategies that impact business decisions; always considering the human element. His leadership has driven international commercial success in areas including image analysis, data management, bioinformatics, advanced clinical trial data analysis leveraging machine learning and federated learning. Remco Jan Geukes Foppen’s academic background includes a PhD in biology and a master’s degree in chemistry, both from the University of Amsterdam.

Connect on LinkedIn >>

Leave a Reply

Your email address will not be published. Required fields are marked *