Scaling drug–target interaction data

Most drug–target data were never designed to be compared at scale. Pharmome mapping takes a different approach, building a shared dataset intended to support more predictable discovery.

Drug discovery has never lacked ideas. What it has lacked is reliable, comparable and comprehensive data. For decades, the industry has generated vast volumes of drug–target interaction data, yet much of it remains fragmented, inconsistent and inaccessible. The result is a system in which promising hypotheses struggle to translate into predictable outcomes in humans.

A collaboration between EvE Bio, Convergent Research, DrugBank and Hugging Face is focused on generating a systematic, standardised map of how approved drugs interact with human druggable targets. Described by its creators as pharmome mapping, the effort aims to transform decades of scattered pharmacological observations into a coherent public dataset for drug discovery.

A focused approach to hard scientific problems

The pharmome-mapping project sits within a broader organisational experiment led by Convergent Research. Founded to address scientific bottlenecks that fall between traditional academic funding and commercial incentives, Convergent designs and launches Focused Research Organisations, or FROs. These are time-limited, technically ambitious entities created to deliver clearly defined scientific outputs.

Large-scale public datasets occupy an awkward middle ground in science funding. They’re often too applied and infrastructure-heavy for traditional academic grants, which favour hypothesis-driven research and novel discoveries.

Anastasia Gamick, President and co-founder of Convergent Research, explains why this structure is particularly well suited to large public data efforts such as pharmome mapping. As she puts it:

“Large-scale public datasets occupy an awkward middle ground in science funding. They’re often too applied and infrastructure-heavy for traditional academic grants, which favour hypothesis-driven research and novel discoveries. But they’re also too pre-competitive for industry, which has little incentive to fund resources that benefit competitors equally.”

FROs are designed specifically for this kind of work, combining start-up-style execution with a public-benefit mandate. Rather than operating as permanent institutions, they are built around a defined goal, a realistic timeline and a clear theory of impact, allowing teams to focus entirely on delivery rather than continuous grant-seeking or commercial positioning.

EvE Bio is one of almost a dozen FROs launched by Convergent since 2022. Built specifically to generate a large-scale, open dataset of drug–protein interactions, it brings together full-time scientists, engineers and data specialists working towards a single deliverable on a fixed timeline.

What pharmome mapping actually means

EvE Bio’s focus is not the full human proteome, but the network of functional interactions between drugs and human druggable targets. This network is what the organisation refers to as the pharmome.

The field has accumulated decades of drug-target interaction data, but it’s limited to particular targets of interest, generated under inconsistent conditions, missing confirmed negative activity and largely sequestered inside corporations.

Elaine McVey Houskeeper, CEO and co-founder of EvE Bio, frames the problem in terms of translation. While most experimental compounds never reach humans, approved drugs represent a uniquely valuable resource. They have been tested in clinical trials, used in real-world settings and, in many cases, prescribed for years. Yet the molecular understanding of how these drugs interact across biological systems remains incomplete.

As Houskeeper explains, the issue is not a lack of information, but the nature of what is available:

“The field has accumulated decades of drug-target interaction data, but it’s limited to particular targets of interest, generated under inconsistent conditions, missing confirmed negative activity and largely sequestered inside corporations. What is publicly available is staggeringly incomplete. It’s data, but it’s not a dataset.”

EvE Bio’s pharmome-mapping project is designed to change that by generating interaction data at scale under consistent experimental conditions. The organisation is systematically testing FDA-approved small molecule drugs against hundreds of validated human druggable targets, recording both activity and confirmed inactivity. The goal is to create a dataset that is comprehensive enough and standardised enough to be genuinely useful as infrastructure.

How the pharmome data is generated

Producing hundreds of thousands of drug–target interaction measurements requires more than high-throughput. Reliability depends on assay quality, experimental consistency and careful data processing.

EvE Bio’s platform is structured around three stages. First, an assay development group optimises and validates assays for each target class, ensuring consistent formats across pharmacological modes and signalling pathways. These assays are then passed to a quantitative screening team, which runs multi-concentration screens with replicates to capture the full spectrum of activity rather than binary outcomes. Finally, a data science team applies a uniform processing pipeline to every interaction tested, active or inactive.

This structure allows EvE Bio to scale without sacrificing comparability. Each data point is generated under defined conditions, using consistent methods and accompanied by detailed metadata. The resulting dataset grows incrementally, with new releases made publicly available as soon as they are validated.

How the data is shared

From the outset, pharmome mapping has been conceived as a public good. As a non-profit FRO, EvE Bio releases its data openly rather than reserving it for commercial advantage. This approach reflects a belief that foundational datasets deliver the greatest value when they are widely used, combined and scrutinised by the community.

To support that goal, EvE Bio has focused on distribution through platforms researchers already rely on. The dataset is available programmatically via Hugging Face, enabling immediate access for machine learning practitioners and is also being integrated directly into DrugBank’s intelligence platform.

Turning interaction data into decisions

For DrugBank, the integration of pharmome data represents a significant expansion of its role in drug discovery workflows. Long recognised as a trusted source of structured drug data, DrugBank is now embedding large-scale interaction data alongside information on biology, diseases, clinical trials and sponsors.

Lisa Downey, a life sciences and health-data executive who joined DrugBank as CEO this past October, describes the value of this integration in practical terms.

“What ties these together is that isolated interaction data has limited utility. The value emerges when you can immediately ask follow-up questions: what’s the clinical precedent, what else do we know about this target, who’s already exploring this space?”

By linking EvE Bio’s systematic interaction measurements to DrugBank’s curated operating system, users can move from isolated observations to informed prioritisation. This has implications for safety pharmacology, drug repurposing, polypharmacology and computational modelling, particularly in early discovery where uncertainty is highest.

Ground truth for machine learning

The pharmome dataset also addresses a longstanding challenge in AI-driven drug discovery, the lack of high-quality public training data. Machine learning models are highly sensitive to bias, noise and missing negatives, all of which have plagued historical interaction datasets.

Machine learning models in drug discovery are only as good as their training data. Public datasets have historically been plagued by inconsistent methods and missing negatives.

Georgia Channing, AI for Science Lead at Hugging Face, highlights the importance of EvE Bio’s standardised approach and the opportunities it creates for the research community:

“Machine learning models in drug discovery are only as good as their training data. Public datasets have historically been plagued by inconsistent methods and missing negatives. EvE’s data, with its standardised protocols and rigorous reporting of inactivity, provides the kind of clean, reproducible ground truth that model builders need.”

Hugging Face’s role is to ensure that this data is easy to access and combine with other resources. By hosting it in standard formats and supporting one-line loading, the platform lowers technical barriers and encourages collaboration between domain scientists and machine learning researchers.

Early use cases and the road ahead

Although still expanding, the pharmome map is already being used in advanced AI research. EvE Bio’s data has served as a ground truth dataset for training and evaluating Ether0, a 24-billion-parameter chemistry reasoning model developed by FutureHouse. This example illustrates how the dataset is already being used beyond its original development context.

The dataset continues to grow on a bi-monthly release cadence. At the time of the DrugBank and Hugging Face announcements, it comprised 385,572 tested interactions across 159 targets. Subsequent releases have expanded coverage to more than 476,000 interactions across 207 targets, all tested against a library of 1,397 FDA-approved compounds.

Looking ahead to 2026, EvE Bio plans to significantly expand coverage of GPCRs and protein kinases, including pathway-specific data to support modelling of biased signalling. A second library of drugs and metabolites is also in development. By the end of its five-year timeline, the organisation aims to deliver a comprehensive, standardised map across major druggable protein families, with data released openly as it is generated and validated. DrugBank intends to continue expanding its coverage across compounds, targets, assay types and evolving value for its users. It will do this by embedding this data – alongside clinical trial history, outcomes and competitive activity data – into guided, AI-assisted workflows that support drug, target and disease prioritisation.

Infrastructure for more predictable discovery

Pharmome mapping will not eliminate failure from drug discovery. By replacing fragmented, selective data with a shared, standardised foundation, it offers a way to reduce uncertainty earlier in the process. The collaboration between EvE Bio, Convergent Research, DrugBank and Hugging Face illustrates how open infrastructure, built with discipline and intent, can change what is possible for the entire field.

Rather than asking whether a compound interacts with a single target, researchers can begin to ask broader, more relevant questions about molecular behaviour, safety and translational potential.

About the authors

670ed5ea-7066-46b3-b9bb-1c2d021b72a3

Anastasia Gamick, President and Co-Founder of Convergent Research

Anastasia Gamick is the Co-Founder and President of Convergent Research, where she helped pioneer the Focused Research Organization (FRO) model: mission-driven, time-bound teams designed to solve large-scale scientific bottlenecks that neither academia nor industry is well-structured to tackle. Anatasia grew Convergent from an idea into a nonprofit incubator supporting multiple FROs across fields like neuroscience, synthetic biology and climate. She sits on the boards of Forest Neurotech, EvE Bio, Cultivarium, Dragonfly, [C]Worthy, Unitary Fund and others, advising bold scientific efforts with public-good missions. Her work blends startup execution with institutional design, helping build the infrastructure and partnerships needed for science at scale.

Prior to Convergent, she held roles at Neuralink, Creator, Segovia, and Curative, with a career spanning frontier tech, global health and science operations.

Elaine McVey Houskeeper, CEO and Co-Founder of EvE Bio

headshot

Elaine Houskeeper (née McVey) is a co-founder of EvE Bio. She holds a B.A. in Neuroscience from Amherst College and Master of Statistics from North Carolina State University. She spent her early career as a scientist in protein folding and cell biology labs, after which she served at Becton Dickinson’s R&D center as a data scientist, working across cell therapy, diabetes technology and smart device programmes, where she contributed to several publications and patents. Most recently she built and led data science teams at a number of early-stage startups, helping grow them from pre-Series A to successful exits. Outside of the data world, she keeps busy camping, mountain biking, trail running and trying to keep up with her two sons. She writes about developments in science, data, tech, and AI at www.datawoman.org.

Georgia Channing, AI for Science Lead at Hugging Face

HeadshotStudio-HD-image-9877

Georgia Channing is the AI for Science Lead at Hugging Face, working at the intersection of machine learning and the natural sciences. She read for her Master’s and PhD in computer science at the University of Oxford, with a focus on applying AI to scientific discovery. Her work has spanned a wide range of AI-for-science areas, including remote sensing, biophysics and materials design. She now focuses on building open tools, models and communities that make scientific research more accessible, collaborative and reproducible.

Lisa Updated Headshot

Lisa Downey, CEO of DrugBank

Lisa Downey is a life sciences and health data executive with nearly two decades of experience building and scaling commercial, data and platform businesses across the RWD, genomics and broader pharma ecosystem. Her career has included leadership roles at organisations such as Clarivate and GlobalData, where she focused on helping life sciences companies make faster, more confident decisions, from early discovery through to commercialisation, by operationalising complex data and leading high impact, cross functional teams.

Most recently, she led Clarivate’s genomic and rare disease data business, building a new platform from the ground up, establishing strategic partnerships and working closely with pharma leaders navigating the next wave of AI driven innovation.

Lisa joined DrugBank this past October as CEO, at an incredibly transformative moment for drug discovery. DrugBank has long been recognised as a pioneer in structured drug data, trusted across industry and academia with more than 58,000 citations, and is now expanding that leadership into a new frontier: a trusted intelligence operating system built on the most comprehensive drug knowledgebase, connecting biomedical data and commercial insight for next generation drug discovery.