From leaderboards to lab notebooks: AI designs reach preclinical testing

No comments

For years, AI drug discovery has been judged on benchmark performance. Now, a set of studies shows what happens when those designs are made and tested in preclinical settings.

shutterstock_2719369997 (1)

For years, the AI drug discovery field has measured progress primarily against computational benchmarks – molecular property prediction accuracy, docking score improvements, generative validity metrics. While these benchmarks served a purpose during the technology’s formative period, they also created a dangerous feedback loop: models optimised for in silico performance with no guarantee that the outputs would survive contact with a wet lab.

In Q1 2026, a notable shift is underway. A cluster of peer-reviewed publications report AI-designed molecules and biological tools that have been experimentally validated in preclinical settings – not just scored on held-out test sets. The design-to-validation loop in drug discovery is closing and the results are revealing both what works and where critical gaps remain.

A cluster of peer-reviewed publications report AI-designed molecules and biological tools that have been experimentally validated in preclinical settings.

The most striking example comes from CAMPER, a mechanistic AI platform for antimicrobial peptide design published in Nature Communications.¹ The system designed WP-CAMPER1, a 12-mer peptide that kills Staphylococcus aureus MW2 at a minimal inhibitory concentration of 4 µg/mL. A topical formulation reduced MRSA burden by 2.5 log10 in a murine prophylactic skin infection model, while its D-enantiomer achieved 1.37 log10 reduction in an established biofilm infection model.¹ This is not a prediction that a molecule might be active. It is a designed molecule, validated through synthesis, in vitro assay and in vivo testing, that addresses one of the most urgent unmet needs in infectious disease. The antimicrobial resistance crisis has been largely neglected by traditional pharma economics; AI-driven design that can produce validated candidates at lower cost may be the only route to replenishing the antimicrobial pipeline at the pace required.

AI-driven design that can produce validated candidates at lower cost may be the only route to replenishing the antimicrobial pipeline at the pace required.

A parallel result emerged from work using the GenSLM protein language model for enzyme engineering, also published in Nature Communications.² Researchers used GenSLM to design novel TrpB (β-subunit of tryptophan synthase) enzyme variants that express in Escherichia coli, are stable and are catalytically active. Many generated TrpBs demonstrated significant substrate promiscuity, accepting non-canonical substrates typically inaccessible to natural TrpBs – and several outperformed both natural and laboratory-evolved variants.² For medicinal chemists, this has immediate practical implications: biocatalysis is increasingly central to pharmaceutical synthesis and the ability to computationally generate functional enzyme starting points eliminates what has historically been the most time-consuming step in biocatalyst development – the search for an active scaffold to optimise.

In the gene editing space, Profluent’s Protein2PAM model delivered experimentally confirmed results that are difficult to dismiss. The evolution-informed protein language model was trained on over 45,000 CRISPR-Cas protospacer adjacent motif (PAM) associations and used to computationally evolve Nme1Cas9 variants.³ The engineered proteins displayed broadened PAM recognition and up to 50-fold increases in DNA cleavage rates compared to wild type – validated in human cell lysate assays performed with the Kleinstiver Lab at Massachusetts General Hospital.³ No laboratory evolution. No structural modelling as input. The model learnt protein–DNA interaction rules from sequence data alone and generated functional variants in a single computational pass.

At the protein therapeutics end, NVIDIA’s Proteina-Complexa generative model – launched at GTC 2026 as part of the BioNeMo platform – is being used by Novo Nordisk, Viva Biotech and Manifold Bio to design protein binders for therapeutic targets.⁴ Critically, these companies report having experimentally tested the generated designs, moving beyond computational scoring to physical validation. The simultaneous expansion of the AlphaFold Protein Structure Database by roughly 30 million AI-predicted protein complex structures⁴ provides the structural context that makes such design efforts increasingly tractable.

Taken together, these results mark a qualitative shift in what AI drug discovery claims can be supported by evidence:

A designed antimicrobial peptide active in a mouse infection model¹
Engineered enzymes that outperform nature and directed evolution²
CRISPR variants with 50-fold activity gains confirmed in human cell lysate³
Protein binders moving from generative model to experimental assay.⁴

Each of these required closing the loop between computation and experiment – and each was published with the experimental data to support the claim.

However, honest analysis demands acknowledging the distance that remains between preclinical validation and therapeutic utility. An antimicrobial peptide active in a mouse model is not a drug; historically high translational failure rates between animal models and human efficacy in the antimicrobial space are well documented.⁵ An enzyme that outperforms natural variants in a biochemical assay still requires formulation, stability engineering and process development before it contributes to pharmaceutical manufacturing. A 50-fold improvement in Cas9 cleavage rate in lysate does not guarantee equivalent performance in a cellular or in vivo context, where chromatin accessibility, delivery efficiency and off-target effects dominate. The history of AI in drug discovery is littered with computational successes that failed to translate and the current generation of validated results – while genuinely more advanced than prior cycles – has not yet crossed the translational chasm.

The history of AI in drug discovery is littered with computational successes that failed to translate.

A second development worth tracking is the emergence of multi-agent LLM systems designed to sit alongside discovery scientists rather than replace their workflows. CLADD, published in the AAAI 2026 proceedings by researchers at Genentech Research, deploys specialised LLM agent teams – for knowledge graph retrieval, molecular annotation and prediction synthesis – that collaborate through retrieval-augmented generation to answer drug discovery questions without domain-specific fine tuning.⁶

CLADD outperforms both general-purpose and domain-specific LLMs on tasks including drug-target interaction prediction and toxicity classification. AstraZeneca’s ChatInvent takes a similar agentic approach integrated directly into the company’s discovery pipeline for molecular design and synthesis planning.⁷ These are not autonomous discovery engines, but decision-support tools that augment the medicinal chemist’s existing workflow – and that distinction may prove more important than raw predictive accuracy.

The benchmark era rewarded novelty in model architecture. The validation era rewards the harder, less publishable work of synthesis, assay development and honest reporting of experimental results.

The convergence of these two trends – experimentally validated AI-designed molecules and agentic decision-support tools deployed in active programmes – suggests that the field is entering a more mature phase. The benchmark era rewarded novelty in model architecture. The validation era rewards the harder, less publishable work of synthesis, assay development and honest reporting of experimental results. For discovery scientists evaluating whether AI tools merit adoption in their programmes, the question is no longer ‘does the model beat a baseline on a public dataset? It is, ‘has anything this model designed been made, tested and shown to work in a relevant pre-clinical system?’

In Q1 2026, for the first time, a meaningful number of published answers to that question are yes. Whether the current generation of validated AI outputs can survive the transition from preclinical proof-of-concept to IND-enabling studies remains the next – and far harder – test.

References

[1] Shehadeh F, Mishra B, Ferrer-Espada R, et al. CAMPER: mechanistic artificial intelligence for designing peptides that target MRSA persisters. Nature Communications (2026). DOI: 10.1038/s41467-026-70348-9. https://www.nature.com/articles/s41467-026-70348-9

[2] Lambert T, Tavakoli A, Dharuman G, et al. Sequence-based generative AI design of versatile tryptophan synthases. Nature Communications 17, 1680 (2026). DOI: 10.1038/s41467-026-68384-6. https://www.nature.com/articles/s41467-026-68384-6

[3] Nayfach S, et al. Customizing CRISPR-Cas PAM specificity with protein language models. Nature Biotechnology (2026). DOI: 10.1038/s41587-025-02995-0. https://www.nature.com/articles/s41587-025-02995-0

[4] NVIDIA Newsroom. NVIDIA Expands Open Model Families to Power the Next Wave of Agentic, Physical and Healthcare AI. 16 March 2026. https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai

[5] Theuretzbacher U, et al. The global preclinical antibacterial pipeline. Nature Reviews Microbiology 18, 275-285 (2020). DOI: 10.1038/s41579-019-0288-0. https://doi.org/10.1038/s41579-019-0288-0

[6] Lee N, De Brouwer E, et al. RAG-Enhanced Collaborative LLM Agents for Drug Discovery (CLADD). Proceedings of the AAAI Conference on Artificial Intelligence (2026). https://ojs.aaai.org/index.php/AAAI/article/view/37020

[7] AstraZeneca. Democratising real-world drug discovery through agentic AI (ChatInvent). Drug Discovery Today (2026). DOI: 10.1016/j.drudis.2026.104605. https://www.sciencedirect.com/science/article/pii/S1359644626000103