New algorithm identifies a novel CRISPR-Cas system

Posted: 24 November 2023 | | No comments yet

Findings of rare CRISPR-linked gene modules and a novel CRISPR-Cas system have promising implications for genomic therapeutics.


Scientists from the Howard Hughes Medical Institute at the Massachusetts Institute of Technology (MIT) have found, using a new algorithm named FLSHclust (“flash clust”), 188 rare and previously unknown CRISPR-linked gene modules, including a novel type VII CRISPR-Cas system, among billions of protein sequences. Their approach and its discoveries offer new opportunities for utilising CRISPR systems and understanding the vast functional diversity of microbial proteins.

CRISPR systems have been leveraged to develop biomolecular approaches, such as CRISPR-Cas-mediated genome editing. The discovery of previously unknown CRISPR systems has potential to lead to the further development of these biotechnologies, like safer and more effective genomic therapeutics.

Computational searches of protein sequences data bases have expanded CRISPR’s usefulness, although the algorithmic approaches frequently used have become impractical for mining exponentially growing datasets containing billions of proteins. Han Altae-Tran and his colleagues developed the FLSHclust (fast locality-sensitive hashing-based clustering) algorithm to overcome this issue. The FLSHclust algorithm clusters proteins by sequence similarity, which can quickly and efficiently analyse vast protein sequence databases, unlike currently available methods.

The team used FLSHclust to search for rare CRISPR systems in an 8.8 terrabase pair metagenomic database containing eight billion proteins and 10.2 million CRISPR arrays to evaluate their approach. This analysis revealed 188 previously unknown CRISPR-associated genes. The authors also identified and characterised a new class of Cas-14 containing CRISPR system, type VII, which acts on RNA.

The newly identified systems were rare, and many only encompassed a single cluster out of the almost 130,000 CRISPR-linked clusters revealed by FLSHclust. The authors of the study said: “The discovery of previously unknown cas genes and CRISPR systems substantially expands the known CRISPR diversity, emphasising the functional versatility of CRISPR whereby previously undiscovered proteins and domains are often recruited, either replacing preexisting components or conferring newly identified functions to the preexisting scaffold of Cas proteins.”

They concluded: “Taken together, the results of the work reveal unprecedented organisational and functional flexibility and modularity of CRISPR systems but also demonstrates that most variants are rare and only found in relatively unusual bacteria and archaea.”

This study was published in Science.