From fragments to maps: scaling drug–target interaction data

Share via

Posted: 31 March 2026 | Drug Target Review | No comments yet

Most drug–target data were never designed to be compared at scale. Pharmome mapping takes a different approach, building a shared dataset intended to support more predictable discovery.

Abstract network of interconnected nodes representing drug–target interactions in a biological system.

Drug discovery has never lacked ideas. What it has lacked is reliable, comparable and comprehensive data. For decades, the industry has generated vast volumes of drug–target interaction data, yet much of it remains fragmented, inconsistent and inaccessible. The result is a system in which promising hypotheses struggle to translate into predictable outcomes in humans.

A collaboration between EvE Bio, Convergent Research, DrugBank and Hugging Face is focused on generating a systematic, standardised map of how approved drugs interact with human druggable targets. Described by its creators as pharmome mapping, the effort aims to transform decades of scattered pharmacological observations into a coherent public dataset for drug discovery.

A focused approach to hard scientific problems

The pharmome-mapping project sits within a broader organisational experiment led by Convergent Research. Founded to address scientific bottlenecks that fall between traditional academic funding and commercial incentives, Convergent designs and launches Focused Research Organisations, or FROs. These are time-limited, technically ambitious entities created to deliver clearly defined scientific outputs.

Large-scale public datasets occupy an awkward middle ground in science funding. They’re often too applied and infrastructure-heavy for traditional academic grants, which favour hypothesis-driven research and novel discoveries.

Anastasia Gamick, President and co-founder of Convergent Research, explains why this structure is particularly well suited to large public data efforts such as pharmome mapping. As she puts it:

“Large-scale public datasets occupy an awkward middle ground in science funding. They’re often too applied and infrastructure-heavy for traditional academic grants, which favour hypothesis-driven research and novel discoveries. But they’re also too pre-competitive for industry, which has little incentive to fund resources that benefit competitors equally.”

FROs are designed specifically for this kind of work, combining start-up-style execution with a public-benefit mandate. Rather than operating as permanent institutions, they are built around a defined goal, a realistic timeline and a clear theory of impact, allowing teams to focus entirely on delivery rather than continuous grant-seeking or commercial positioning.

EvE Bio is one of almost a dozen FROs launched by Convergent since 2022. Built specifically to generate a large-scale, open dataset of drug–protein interactions, it brings together full-time scientists, engineers and data specialists working towards a single deliverable on a fixed timeline.

What pharmome mapping actually means

EvE Bio’s focus is not the full human proteome, but the network of functional interactions between drugs and human druggable targets. This network is what the organisation refers to as the pharmome.

The field has accumulated decades of drug-target interaction data, but it’s limited to particular targets of interest, generated under inconsistent conditions, missing confirmed negative activity and largely sequestered inside corporations.

Elaine McVey Houskeeper, CEO and co-founder of EvE Bio, frames the problem in terms of translation. While most experimental compounds never reach humans, approved drugs represent a uniquely valuable resource. They have been tested in clinical trials, used in real-world settings and, in many cases, prescribed for years. Yet the molecular understanding of how these drugs interact across biological systems remains incomplete.

As Houskeeper explains, the issue is not a lack of information, but the nature of what is available:

“The field has accumulated decades of drug-target interaction data, but it’s limited to particular targets of interest, generated under inconsistent conditions, missing confirmed negative activity and largely sequestered inside corporations. What is publicly available is staggeringly incomplete. It’s data, but it’s not a dataset.”

EvE Bio’s pharmome-mapping project is designed to change that by generating interaction data at scale under consistent experimental conditions. The organisation is systematically testing FDA-approved small molecule drugs against hundreds of validated human druggable targets, recording both activity and confirmed inactivity. The goal is to create a dataset that is comprehensive enough and standardised enough to be genuinely useful as infrastructure.

How the pharmome data is generated

Producing hundreds of thousands of drug–target interaction measurements requires more than high-throughput. Reliability depends on assay quality, experimental consistency and careful data processing.

EvE Bio’s platform is structured around three stages. First, an assay development group optimises and validates assays for each target class, ensuring consistent formats across pharmacological modes and signalling pathways. These assays are then passed to a quantitative screening team, which runs multi-concentration screens with replicates to capture the full spectrum of activity rather than binary outcomes. Finally, a data science team applies a uniform processing pipeline to every interaction tested, active or inactive.

This structure allows EvE Bio to scale without sacrificing comparability. Each data point is generated under defined conditions, using consistent methods and accompanied by detailed metadata. The resulting dataset grows incrementally, with new releases made publicly available as soon as they are validated.

How the data is shared

From the outset, pharmome mapping has been conceived as a public good. As a non-profit FRO, EvE Bio releases its data openly rather than reserving it for commercial advantage. This approach reflects a belief that foundational datasets deliver the greatest value when they are widely used, combined and scrutinised by the community.

To support that goal, EvE Bio has focused on distribution through platforms researchers already rely on. The dataset is available programmatically via Hugging Face, enabling immediate access for machine learning practitioners and is also being integrated directly into DrugBank’s intelligence platform.

Turning interaction data into decisions

For DrugBank, the integration of pharmome data represents a significant expansion of its role in drug discovery workflows. Long recognised as a trusted source of structured drug data, DrugBank is now embedding large-scale interaction data alongside information on biology, diseases, clinical trials and sponsors.

Lisa Downey, a life sciences and health-data executive who joined DrugBank as CEO this past October, describes the value of this integration in practical terms.

“What ties these together is that isolated interaction data has limited utility. The value emerges when you can immediately ask follow-up questions: what’s the clinical precedent, what else do we know about this target, who’s already exploring this space?”

By linking EvE Bio’s systematic interaction measurements to DrugBank’s curated operating system, users can move from isolated observations to informed prioritisation. This has implications for safety pharmacology, drug repurposing, polypharmacology and computational modelling, particularly in early discovery where uncertainty is highest.

Ground truth for machine learning

The pharmome dataset also addresses a longstanding challenge in AI-driven drug discovery, the lack of high-quality public training data. Machine learning models are highly sensitive to bias, noise and missing negatives, all of which have plagued historical interaction datasets.

Machine learning models in drug discovery are only as good as their training data. Public datasets have historically been plagued by inconsistent methods and missing negatives.

Georgia Channing, AI for Science Lead at Hugging Face, highlights the importance of EvE Bio’s standardised approach and the opportunities it creates for the research community:

“Machine learning models in drug discovery are only as good as their training data. Public datasets have historically been plagued by inconsistent methods and missing negatives. EvE’s data, with its standardised protocols and rigorous reporting of inactivity, provides the kind of clean, reproducible ground truth that model builders need.”

Hugging Face’s role is to ensure that this data is easy to access and combine with other resources. By hosting it in standard formats and supporting one-line loading, the platform lowers technical barriers and encourages collaboration between domain scientists and machine learning researchers.

Early use cases and the road ahead

Although still expanding, the pharmome map is already being used in advanced AI research. EvE Bio’s data has served as a ground truth dataset for training and evaluating Ether0, a 24-billion-parameter chemistry reasoning model developed by FutureHouse. This example illustrates how the dataset is already being used beyond its original development context.

The dataset continues to grow on a bi-monthly release cadence. At the time of the DrugBank and Hugging Face announcements, it comprised 385,572 tested interactions across 159 targets. Subsequent releases have expanded coverage to more than 476,000 interactions across 207 targets, all tested against a library of 1,397 FDA-approved compounds.

Looking ahead to 2026, EvE Bio plans to significantly expand coverage of GPCRs and protein kinases, including pathway-specific data to support modelling of biased signalling. A second library of drugs and metabolites is also in development. By the end of its five-year timeline, the organisation aims to deliver a comprehensive, standardised map across major druggable protein families, with data released openly as it is generated and validated. DrugBank intends to continue expanding its coverage across compounds, targets, assay types and evolving value for its users. It will do this by embedding this data – alongside clinical trial history, outcomes and competitive activity data – into guided, AI-assisted workflows that support drug, target and disease prioritisation.

Infrastructure for more predictable discovery

Pharmome mapping will not eliminate failure from drug discovery. By replacing fragmented, selective data with a shared, standardised foundation, it offers a way to reduce uncertainty earlier in the process. The collaboration between EvE Bio, Convergent Research, DrugBank and Hugging Face illustrates how open infrastructure, built with discipline and intent, can change what is possible for the entire field.

Rather than asking whether a compound interacts with a single target, researchers can begin to ask broader, more relevant questions about molecular behaviour, safety and translational potential.

Related organisations
Convergent Research, DrugBank, EvE Bio, FutureHouse, Hugging Face

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

From fragments to maps: scaling drug–target interaction data

A focused approach to hard scientific problems

What pharmome mapping actually means

How the pharmome data is generated

How the data is shared

Turning interaction data into decisions

Ground truth for machine learning

Early use cases and the road ahead

Infrastructure for more predictable discovery

Leave a Reply Cancel reply

Recommended

From fragments to maps: scaling drug–target interaction data

A focused approach to hard scientific problems

What pharmome mapping actually means

How the pharmome data is generated

How the data is shared

Turning interaction data into decisions

Ground truth for machine learning

Early use cases and the road ahead

Infrastructure for more predictable discovery

The truth about AI in drug discovery: what the experts really think

CAR T therapy could benefit from reversing T cell exhaustion

Computational design drives new generation of synthetic promoters

Scientists link IVNS1ABP gene to new ageing disorder

New evidence links autoantibodies to Long COVID

Leave a Reply Cancel reply