Genome-wide association studies have linked thousands of genetic variants to disease, yet most remain disconnected from drug-relevant biology. Neville Sanjana, Professor at New York University and Core Faculty Member at the New York Genome Center, explains how scalable CRISPR screens systematically link noncoding variants to causal genes and therapeutic targets.

shutterstock_607718810

Genome-wide association studies (GWAS) have transformed human genetics by linking thousands of variants to disease risk. For drug discovery, however, this expanding body of data presents a persistent challenge: determining which variants are causal, how they exert their effects and which downstream mechanisms are most amenable to therapeutic intervention. This challenge is particularly acute for common diseases, where most associated variants lie outside protein-coding genes.

Professor Neville Sanjana is Core Faculty Member at the New York Genome Center and Professor of Biology and Neuroscience at New York University. His research develops scalable genome engineering and functional genomics approaches designed to systematically connect genetic variants to molecular mechanisms and therapeutic targets.

“For almost two decades, I’ve been fascinated by how the human genome works to influence different human traits and diseases,” he reveals.

For almost two decades, I’ve been fascinated by how the human genome works to influence different human traits and diseases.

Rather than studying individual genes in isolation, Sanjana’s group develops scalable CRISPR-based methods to target thousands of genes or genomic regions in a single experiment. These approaches are designed to identify which genetic variants are most important to focus on for a given disease and to connect causal variants to drug-relevant target genes.

Scalable perturbation biology

Rather than focusing on single, hypothesis-driven experiments, Sanjana’s group uses systematic perturbation of large genomic regions to study complex disease genetics at scale.

“My group develops methods to edit and engineer the human genome and interpret the functional effects of genetic variants,” he explains. “One key aspect of our work is that we don’t pursue single targets in a purely hypothesis-driven fashion; instead, we develop scalable methods to target thousands of genes or regions of the genome in a single experiment.”

This strategy has led to the development of a portfolio of technologies including pooled CRISPR screens, OverCITE-seq, CRISPore-seq, STING-seq and MultiPerturb-seq, enabling functional evaluation of genetic variants across multiple disease contexts.

One key aspect of our work is that we don’t pursue single targets in a purely hypothesis-driven fashion; instead, we develop scalable methods to target thousands of genes or regions of the genome in a single experiment.

“Our work spans several disease areas, including oncology, autoimmunity, neural development, neuropsychiatry, metabolic dysfunction and ageing,” he says.

For drug discovery teams, this disease coverage enables approaches developed in one area to be applied to others, particularly in the context of GWAS, where large volumes of association data require experimental methods capable of resolving causality at scale.

Addressing the limitations of GWAS

Despite their power, GWAS provide limited insight into causality or underlying mechanism, leaving uncertainty around the functional drivers of disease.

“A key problem in the age of GWAS, when we are collecting thousands of genetic variants and disease associations, is knowing which variants are most important to focus on for a particular disease,” Sanjana points out. “GWAS is fantastic at finding genetic associations but, as we’ve all learnt, correlation does not equal causation.”

A key problem in the age of GWAS, when we are collecting thousands of genetic variants and disease associations, is knowing which variants are most important to focus on for a particular disease.

The problem is compounded by the genomic location of many variants. “For common diseases like heart disease or diabetes, most of the GWAS signal is in noncoding regions of the human genome, where we have a much harder time predicting or interpreting the effect of a genetic variant,” he continues.

STING-seq was developed to address these limitations by combining pooled CRISPR perturbation of GWAS-linked variants with single-cell sequencing.

“STING-seq is a scalable method that combines CRISPR pooled screens targeting thousands of GWAS variants with single-cell sequencing to quickly identify which variants are most likely causal and, for those noncoding variants, which genes they likely modulate.”

By linking variants to downstream target genes in a disease-agnostic way, STING-seq enables systematic target discovery across indications.

shutterstock_2533861023

Genome-wide association studies (GWAS) identify genetic variants linked to disease risk, but functional approaches such as CRISPR screening are needed to determine which variants drive disease biology and represent viable drug targets. Image credit: Shutterstock / CI Photos

Flexible target selection

In some cases, STING-seq reveals effects that extend beyond individual genes. “These are the really exciting hits in STING-seq,” Sanjana admits. “These are cases where we can connect a GWAS variant not just to a single gene but an entire regulatory network.”

For drug discovery, this network-level view supports more flexible target selection strategies. “In this way, we can pick and choose whether the best drug target is the regulatory element, like a transcription factor, or the downstream targets of that transcription factor,” Sanjana explains. “In these cases, we get a better picture of the many molecular changes occurring in cells and also enlarge our target space.”

While STING-seq focuses on identifying causal variants and their target genes, understanding how those genes drive disease often requires deeper insight into gene regulation and chromatin state.

Integrating RNA and chromatin biology

Many diseases, particularly in oncology and neurodevelopment, are driven by dysregulation of gene expression and chromatin state. Traditional functional screens often focus on a single readout, limiting their ability to resolve causal mechanisms.

“There are many diseases that involve gene regulation and chromatin. In our MultiPerturb-seq work, we focused on one of these – a rare paediatric brain tumour that stems from dysregulation of SWI/SNF chromatin remodelling,” shares Sanjana.

The SWI/SNF complex is a multi-protein chromatin remodelling complex that regulates gene expression by repositioning nucleosomes and controlling access to DNA. Disruption of this complex is a well-established driver of several cancers, including paediatric brain tumours.

“The changes in chromatin are really the proximal effects of the causal genetic mutation and thus make a great phenotype for our genetic screen,” he says.

The changes in chromatin are really the proximal effects of the causal genetic mutation and thus make a great phenotype for our genetic screen.

MultiPerturb-seq captures RNA expression and chromatin accessibility simultaneously in single cells, enabling integrated analysis of these regulatory layers.

“In general, changes in gene expression as captured by RNA sequencing and changes in chromatin state should be related but these are separate layers of information.”

This dual readout enhances target selection by identifying perturbations that correct disease-relevant phenotypes at multiple levels.

“Capturing both in the same single cells gives us a much richer picture of the cell – especially in diseases of chromatin. We can find which drug targets might directly address the disease-relevant phenotype,” Sanjana explains.

ZNHIT1 as a reprogramming target in AT/RT

A key finding from the MultiPerturb-seq study was the identification of ZNHIT1 as a potential target in atypical teratoid rhabdoid tumours (AT/RT), a rare and aggressive paediatric brain cancer. When analysed using individual molecular readouts, several perturbations appeared promising.

“When looking at gene expression (RNA) or chromatin accessibility (DNA) alone, we found several potential drug targets that restored AT/RT tumour cells to something more similar to normal brain tissues,” declared Sanjana.

When looking at gene expression (RNA) or chromatin accessibility (DNA) alone, we found several potential drug targets that restored AT/RT tumour cells to something more similar to normal brain tissues.

However, ZNHIT1 emerged as uniquely effective when both readouts were considered together. “But when we asked which perturbations led to a normal state in both gene expression and chromatin, ZNHIT1 stood out from every other perturbation we examined.”

The resulting therapeutic hypothesis is based on cellular reprogramming rather than cytotoxicity. “We believe that inhibition of ZNHIT1 can serve as a reprogramming therapy to move AT/RT cells into a more normal state, although nuclear proteins like ZNHIT1 can be challenging drug targets,” he conceded.

Beyond protein-coding targets, these approaches also raise the question of how noncoding disease drivers can be systematically interrogated.

Targeting noncoding transcripts with Cas13

While proteins remain the dominant class of drug targets, advances in RNA-targeting technologies are enabling more systematic interrogation of RNA species. Sanjana sees particular potential in applying Cas13 to functional analysis of noncoding RNAs.

“There has been well-deserved attention on proteins as drug targets but our recent work on essential long noncoding RNAs and their potential role as drivers in cancer suggests an exciting and underexplored space of noncoding transcripts.”

There has been well-deserved attention on proteins as drug targets but our recent work on essential long noncoding RNAs and their potential role as drivers in cancer suggests an exciting and underexplored space of noncoding transcripts.

The main barrier, as Sanjana explains, is the lack of systematic functional data. “There is so little data right now. We are trying to fix that by developing better reagents (pooled libraries) and datasets to connect noncoding transcripts to key phenotypes in cancer and metabolism.”

Once causality is established, multiple therapeutic modalities become feasible.

“If we can identify noncoding transcripts with causal roles in disease biology, then we can use many different approaches, such as RNAi or antisense, to modulate their expression,” he says.

Similar challenges apply to noncoding regulatory DNA elements, where disease-associated variants influence gene control rather than transcript sequence.

Regulatory elements as drug targets

CRISPR tiling studies show that many disease-relevant genes are regulated by multiple enhancers. For drug discovery, this complicates decisions about whether a regulatory element should be targeted directly or used to identify a downstream gene target.

“In some cases – like the intronic enhancer of BCL11A, the repressor of foetal haemoglobin – the CRISPR screen led us to a therapy that can cure sickle-cell anaemia.” However, Sanjana emphasises that this direct targeting of regulatory elements will not always be appropriate. “But I don’t think that will always be the case. For noncoding elements, I think making the causal link is really of prime importance.” 

One risk is misattribution, where a disease-associated variant is assumed to act through the nearest gene. This problem is illustrated by the fat mass and obesity-associated (FTO) locus, a genomic region initially linked to obesity risk through genome-wide association studies. “Here the presumption was that a noncoding variant in FTO impacted that gene, whereas, in reality, this variant was working via a different, neighbouring gene.” 

As a result, regulatory elements often serve as a route to target identification rather than targets in their own right. “In most cases, I think we will need to connect regulatory elements to genes and those genes will be the drug target,” reflects Sanjana.

The role of AI in future target discovery

Looking ahead, Sanjana believes artificial intelligence (AI) will play a central role in extending the impact of CRISPR-based discovery platforms. 

“I’m excited about the convergence of AI trained on large CRISPR perturbation datasets and experimental data from approaches like STING-seq,” he says.

Experimental constraints mean not all variants, cell types or genetic backgrounds can be tested directly. “This is beyond the scope of what we can probe experimentally and so we will need excellent AI-driven approaches to fill in these large gaps.”

Advances in DNA language models suggest this integration is already underway. “Already, we’ve seen tremendous improvements in DNA language models with variant effect prediction and I expect that to continue.” Ultimately, this convergence could enable virtual experimentation that accelerates early drug discovery. “This would massively accelerate drug discovery efforts across many diseases and this would be amazing. I hope and believe that we’ll be there faster than we think.”