Algorithm to more effectively identify antibiotic compounds
An international team of scientists has developed a candidate identification technique in the form of an algorithm that reduces the chances of rediscovering known compounds.
Researchers from Carnegie Mellon University; the University of California, San Diego; and St. Petersburg State University in Russia have described a new means of searching vast repositories of compounds produced by microbes. By analysing the mass spectra of the compounds, they were able to identify those known compounds among them and eliminate them from further analysis. This enabled them to focus instead on the unknown variants – the proverbial needles within the haystack – that might potentially be better or more efficient antibiotics, anticancer drugs or other pharmaceuticals.
In just a week, running on 100 computers, the algorithm (called Dereplicator+) analysed one billion mass spectra in the Global Natural Products Social molecular network at UC San Diego and identified more than 5,000 promising, unknown compounds that merit further investigation, said Hosein Mohimani, assistant professor in CMU’s Computational Biology Department and first author of the article.
The algorithm that powers this molecular search engine is now available for use by any investigator to study additional repositories.
In the past, mass spectrometry data repositories have been underused owing to the difficulty of searching through them and because those efforts have thus far been plagued by high rates of rediscovering known compounds.
Commenting on the phenomenon that plagues drug discovery researchers, Mohimani said: “It’s unbelievable how many times people have rediscovered penicillin”.
Analysing the compounds’ mass spectra – essentially, a measurement of the masses within a sample that has been ionised – is a relatively inexpensive way to identify possible new pharmaceuticals. However, existing techniques are largely limited to peptides, which have simple structures such as chains and loops.
“We were only looking at the tip of the iceberg,” explained Mohimani.
In order to analyse the larger number of complex compounds that have entangled structures and numerous loops and branches, the researchers developed a method for predicting how a mass spectrometer would break apart those molecules. Beginning with the weakest rings, the method simulated what would happen as the molecules came apart. Using 5,000 known compounds and their mass spectra, they trained a computer model that could then be used to predict how other compounds would break down.
Mohimani said Dereplicator+ can not only identify known candidates that don’t require further investigation, but it can also find less common variants of the known compounds that likely would go undetected within a sample.
This article was published in the journal Nature Communications.