Making sense of AI: bias, trust and transparency in pharma R&D

Share via

Posted: 11 September 2025 | Alessio Zoccoli, Marcella Zucca, Remco Jan Geukes Foppen, Vincenzo Gioia | No comments yet

AI is increasingly used in drug discovery, but hidden bias and ‘black box’ models threaten trust and transparency. This article explores how explainable AI can turn opaque predictions into clear, accountable insights.

Digital illustration of a glowing capsule pill with circuit lines, symbolising artificial intelligence in drug discovery and pharmaceutical innovation

The integration of artificial intelligence (AI) into drug discovery has revolutionised R&D, dramatically accelerating the identification of new drug targets and the prediction of compound efficacy. AI models exhibit tremendous predictive capabilities, but their complexity also creates a significant challenge: the ‘black box’ problem. These state-of-the-art AI models often produce outputs without revealing the reasoning behind their decisions, making it difficult for researchers to understand or verify their predictions. This opacity is a critical barrier in drug discovery, where knowing why a model makes a certain prediction is as important as the prediction itself.

The pursuit of explainable AI (xAI) starts not with algorithms but with an acknowledgment of ambiguity – ie, the uncertainty and complexity of effective AI implementation that is inherent to AI outputs. Rather than viewing ambiguity as a deficiency, researchers are compelled to develop techniques that ‘fill in the gaps’ of understanding, thereby improving trustworthiness and scientific insight. The goal of xAI is to foster better decision-making and innovative solutions in drug discovery.

This conceptual shift has moved the field from black-box AI towards more interpretable models. Researchers are developing xAI tools that enable greater transparency, such as counterfactual explanations. These enable scientists to ask ‘what if’ questions, such as ‘how would the model’s prediction change if certain molecular features or protein domains were different?’ This way, biological insights can be extracted directly from AI models, helping to refine drug design, predict off-target effects and reduce risks in drug development pipelines.

access your free copy

Automation now plays a central role in discovery. From self-driving laboratories to real-time bioprocessing

This report explores how data-driven systems improve reproducibility, speed decisions and make scale achievable across research and development.

Inside the report:

Advance discovery through miniaturised, high-throughput and animal-free systems
Integrate AI, robotics and analytics to speed decision-making
Streamline cell therapy and bioprocess QC for scale and compliance
And more!

This report unlocks perspectives that show how automation is changing the scale and quality of discovery. The result is faster insight, stronger data and better science – access your free copy today

Regulatory landscape: the EU AI Act and explainable AI

Different countries and regions are taking varied approaches to AI regulation, but on 02 August 2025, a significant phase of the EU AI Act came into force. This section of the Act primarily focuses on governance and general-purpose AI (GPAI) models, marking a major step in its implementation. A core principle of the Act is its classification of certain AI systems, especially those in healthcare and drug development, as “high-risk.” This is critical since it mandates strict requirements for transparency and accountability. High-risk systems must be “sufficiently transparent” so that users can correctly interpret their outputs.

They cannot simply trust a black-box algorithm without a clear rationale.

This is where xAI becomes essential. In a medical context, healthcare professionals must understand the reasoning behind an AI’s recommendations for diagnosis or treatment. They cannot simply trust a black-box algorithm without a clear rationale. Furthermore, providers of GPAI models – such as large language models (LLMs) – are also now subject to new rules. For healthcare AI, this directly relates to xAI by requiring transparency about training data, algorithm methodology and the factors that influence its results. It is important to note that the Act also includes exemptions. As stated in a legal analysis by the European Federation of Pharmaceutical Industries and Associations (EFPIA), AI systems used “for the sole purpose of scientific research and development” are generally excluded from the Act’s scope. This means that many AI-enabled drug discovery tools used in early-stage research may not be classified as high-risk, as they are not directly used in the final clinical management of patients. AI applications in the Act are regulated based on their specific use case and risk level, with a key distinction made between research tools and clinical or diagnostic systems. Nonetheless, transparency is key to enabling human oversight and identifying potential biases within the system.

Addressing bias in datasets

Bias in datasets is a profound challenge in AI-driven drug discovery and healthcare. AI models depend heavily on the quality and diversity of their training data, as well as how representative they are. When datasets are biased – whether through underrepresentation of certain demographic groups or the fragmentation of data across silos – AI predictions become skewed. This can lead to unfair or inaccurate outcomes, perpetuating healthcare disparities and undermining patient stratification.

For example, if clinical or genomic datasets insufficiently represent women or minority populations, AI models may poorly estimate drug efficacy or safety in these groups. In drug discovery, where complex biological data are combined, such biases can lead to drugs that do not perform well universally or fail to reveal critical safety concerns. Furthermore, data silos restrict the training inputs, exacerbating these limitations and sometimes causing AI ‘hallucinations’ – incorrect or misleading outputs arising from biased or inconsistent data.

This is an AI-generated image.

Explainability as a solution to bias

xAI emerges as a promising strategy to uncover and mitigate dataset biases. By increasing transparency into model decision-making, xAI highlights which features most influence predictions and reveals when bias may be corrupting results. In drug discovery, xAI empowers researchers to dissect the biological and clinical signals that drive predictions, enabling targeted interventions such as rebalancing datasets or refining algorithms to ensure fairness. Moreover, xAI enables stakeholders – including researchers, clinicians and policymakers – to audit AI systems, identify gaps in data coverage, and adjust both data collection strategies and model design. Techniques like preprocessing to balance training samples, integrating multiple complementary datasets and continuous monitoring with xAI frameworks assist in improving fairness and generalisability. This ensures AI models deliver equitable healthcare insights across diverse patient groups.

The reproduction of systemic bias by AI

The rise of generative AI and LLMs in healthcare has magnified the challenge of bias. These models learn from massive but inherently imperfect datasets and are neither aware of nor able to correct biases independently. Instead, they replicate and sometimes amplify these biases in their recommendations or discoveries.

The rise of generative AI and LLMs in healthcare has magnified the challenge of bias.

In pharmaceutical applications, LLMs used for molecule generation, drug interaction predictions, or clinical trial simulations risk producing outputs less effective for underrepresented groups if the datasets lack appropriate biological and demographic diversity. Even when AI-generated conclusions appear accurate, they may be misleading or biased, posing ethical and clinical challenges. This shifts the problem from mere data representation to concerns about equitable access to personalised medicine driven by AI.

Industry responsibility and mitigation strategies

Technology vendors, pharmaceutical companies and data providers play a pivotal role in tackling dataset bias. They must commit to inclusive data practices, ensure fairness in data collection, educate teams about diversity and bias, and implement ongoing algorithmic audits. Advanced approaches like ‘data augmentation,’ where datasets are enriched or synthetically balanced to improve representation, provide additional means to address imbalance. For example, carefully generated synthetic data can mimic underrepresented biological scenarios, helping to reduce bias during model training without compromising patient privacy.

Consequences of the gender data gap

Inequalities in datasets are more than a technical flaw; they represent a structural problem with significant implications for treatment and healthcare outcomes. A prominent example is the gender data gap in life sciences AI: women remain underrepresented in many training datasets. This creates AI systems that work better for men than women, jeopardising the promise of personalised medicine.

For instance, drugs developed with predominantly male data may have inappropriate dosage recommendations for women, resulting in higher adverse reaction rates among females.

Studies in oncology also reveal sex-based differences in treatment responses, emphasising the need for sex-disaggregated data in both research and AI modelling. Without addressing these gaps, AI risks perpetuating existing healthcare disparities.

Explainable AI’s role in overcoming bias

AI-assisted drug discovery offers the prospect of identifying and correcting these biases more effectively than traditional methods. xAI provides transparency into how predictions are made, enabling detection when models disproportionately favour one sex or demographic. This enables focused strategies – like targeted data augmentation or model retraining – to enhance generalisability.

Ultimately, dataset bias undermines AI’s potential to transform healthcare by skewing outcomes and excluding populations. xAI is the critical pathway to exposing, understanding and addressing these biases. Making AI’s decision-making process transparent fosters the development of safer, fairer and more trustworthy AI systems. These advances will be essential in realising equitable, precise and effective AI applications in drug discovery and personalised medicine. Now semantic-level xAI represents a pivotal step towards building AI systems that can reason and communicate in a manner that is both understandable and verifiable by human experts, fostering the trust and regulatory compliance necessary for their widespread adoption in the pharmaceutical industry.

Literature

Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. From siloed data to breakthroughs: multimodal AI in drug discovery. (2025). June 11. Drug Target Review https://www.drugtargetreview.com/article/160597/from-siloed-data-to-breakthroughs-multimodal-ai-in-drug-discovery/

Geukes Foppen RJ, Gioia V, Zoccoli A, Velez CN. Early evidence and emerging trends: How AI is shaping drug discovery and clinical development. (2025). Drug Target Review April Edition https://www.drugtargetreview.com/article/158593/early-evidence-and-emerging-trends-how-ai-is-shaping-drug-discovery-and-clinical-development/

EU AI ACT https://artificialintelligenceact.eu/

EFPIA https://efpia.eu/news-events/the-efpia-view/statements-press-releases/efpia-statement-on-the-use-of-ai-in-the-medicinal-product-lifecycle-in-the-context-of-the-ai-act/

Pharma AI readiness: How the 50 largest companies by market cap stack up (2025). CB insights. https://www.cbinsights.com/research/ai-readiness-index-pharma-2025/
Wagner AD, Oertelt-Prigione S, Adjei A, et al. (2024). Gender medicine and oncology: report and consensus of an ESMO workshop. Annals of Oncology. 10.1093/annonc/mdz414
Geukes Foppen RJ, Gioia V, Gupta S, et al. Methodology for Safe and Secure AI in Diabetes Management in Journal of Diabetes Science and Technology [Internet]. 2025; Available from: https://doi.org/10.1177/19322968241304434

About the authors

Marcella Zucca is Head of Generative AI and Sustainability at Capgemini Italy. She serves on the Program Advisory Committee of Bologna Business School for AI-focused executive masters. With a background in corporate performance management, she combines technical innovation with ESG strategy. She holds a master’s from LUISS in administration and controlling, and a master’s in sustainability and reporting from Tor Vergata. Her work promotes responsible AI adoption, balancing innovation with ethics, legal frameworks and business transformation.

Connect on LinkedIn >>

Vincenzo Gioia is AI innovation strategist and founder of Explainambiguity. He is a business and technology executive, with a 20-year focus on quality and precision for the commercialisation of innovative tools. Vincenzo specialises in artificial intelligence applied to image analysis, business intelligence and excellence. His focus on the human element of technology applications has led to high rates of solution implementation. He holds a master’s degree from the University of Salerno in political sciences and marketing.

Connect on LinkedIn >>

Alessio Zoccoli applies AI for a sustainable future. His deep understanding of industry applications and technical expertise drives innovation in AI-powered solutions for complex business challenges. He specialises in cutting-edge advancements in natural language processing, computer vision, and generative AI. He is a senior data scientist and holds a master’s degree from Roma Tre University in computer engineering, where he also held the role of research fellow.

Connect on LinkedIn >>

Remco Jan Geukes Foppen, PhD, is an AI and life sciences expert specialising in the pharmaceutical sector and founder of Explainambiguity. With a global perspective, he integrates and implements AI-driven strategies that impact business decisions; always considering the human element. His leadership has driven international commercial success in areas including image analysis, data management, bioinformatics, advanced clinical trial data analysis leveraging machine learning and federated learning. Remco Jan Geukes Foppen’s academic background includes a PhD in biology and a master’s degree in chemistry, both from the University of Amsterdam.

Connect on LinkedIn >>

Related conditions
Cancer

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Making sense of AI: bias, trust and transparency in pharma R&D

Automation now plays a central role in discovery. From self-driving laboratories to real-time bioprocessing

Regulatory landscape: the EU AI Act and explainable AI

Addressing bias in datasets

Explainability as a solution to bias

The reproduction of systemic bias by AI

Industry responsibility and mitigation strategies

Consequences of the gender data gap

Explainable AI’s role in overcoming bias

Literature

Leave a Reply Cancel reply

Recommended

Making sense of AI: bias, trust and transparency in pharma R&D

Automation now plays a central role in discovery. From self-driving laboratories to real-time bioprocessing

Regulatory landscape: the EU AI Act and explainable AI

Addressing bias in datasets

Explainability as a solution to bias

The reproduction of systemic bias by AI

Industry responsibility and mitigation strategies

Consequences of the gender data gap

Explainable AI’s role in overcoming bias

Literature

Optimizing antibody leads in early drug discovery with key developability insights

Automated lung organoids to speed up new drug development

The partnership changing the pace of radiopharmaceutical development

Researchers identify new drug site on epilepsy target SV2A

AI powers discovery of new CBLB inhibitor ISM3830

Leave a Reply Cancel reply