Scientists have developed a machine learning system that can predict how complex chemical reactions will produce the correct molecular form for medicines.

Image

The process of drug discovery revolves around fitting atoms together and adjusting the pieces until a useful molecule forms. However, the process of creating better molecules typically demands vast amounts of time and money.

Now researchers have developed a machine learning system designed to speed up this work dramatically while cutting costs. The approach offers a more efficient way to predict how molecules will form during chemical reactions.

“Sometimes we use sophisticated, physics-based computational chemistry tools to understand novel reactions. However, these tools are too expensive to make predictions on thousands of potential new molecules,” said Simone Gallarati, the study’s co-lead author and a joint postdoctoral researcher at the University of Utah and the University of California. “We wanted to train statistical models that were ‘smart’ enough to make accurate predictions on untested reactions but also as cheap as possible.”

Predicting molecular ‘handedness’

A key challenge in chemistry is that molecules can exist as mirror images of one another, known as handedness. One version of a molecule may have beneficial medical effects while its mirror image could be ineffective or even harmful.

A key challenge in chemistry is that molecules can exist as mirror images of one another, known as handedness.

Chemists therefore aim to design reactions that produce the correct ‘hand’ of a molecule. To achieve this, they must carefully select the right combination of catalysts, ligands and substrates.

The new system works as a high-tech filtering tool that can screen tens of thousands of chemical structures and predict how their components will combine. It converts the elements of a reaction into numerical data that a computer can analyse and form the basis for machine learning predictions.

Surprisingly, the model required relatively little training data to perform well. Even with limited input it could reliably predict how reaction components would behave, significantly reducing the time, energy and expense normally required for laboratory testing.

“Most AI requires enormous amounts of data to train models on. That’s a problem in chemistry by which obtaining high-quality, large datasets from experimental work is very expensive and extremely time consuming,” said Matthew Sigman, a chemist at the University of Utah and co-author of the study. “The coolest thing about this tool is that it allows someone to collect smaller bits of data, build reasonably good models and make accurate predictions for known reactions and also transfer predictions to reactions that the models haven't seen yet.”

Many molecules used in medicines have a symmetry known as handedness; they have the same atoms connected in the same order, but the 3D arrangement is mirrored and cannot be superimposed. The body reacts very differently to the 'right-' and 'left-' handed versions of the molecule. Credit: Erin Bucci/UCLA.[/caption]

A high-tech filter for complex reactions

The research focused on asymmetric cross-coupling reactions which are widely used in drug development. These reactions combine two carbon-based molecular fragments using a metal catalyst to create more complex compounds.

They are described as asymmetric because they are designed to favour one handed version of a molecule. Without careful control, experiments may produce an even 50/50 mixture of mirror images. In contrast, an effective asymmetric reaction might generate 95 percent of the desired form and only 5 percent of the unwanted mirror image.

Reactions like this typically require three components: a metal catalyst, a ligand and substrates. The metal catalyst drives the reaction by joining carbon-based molecules. The ligand attaches to the metal and controls which side of the molecule reacts, ultimately influencing the three-dimensional orientation of the final product. As a result, the ligand is often the most important factor in determining molecular handedness.

As a lab-based chemist, this tool is extremely valuable for saving time spent running experiments.

To train their model, Gallarati and colleagues analysed data from four academic studies of asymmetric reactions that used nickel-based catalysts with different ligands. The team then challenged the system to predict the outcomes of hypothetical reactions involving components it had not previously encountered.

The predictions were experimentally tested in the laboratory of study co-author and chemist Abigail Doyle at UCLA, with doctoral student Erin Bucci leading the effort.

“As a lab-based chemist, this tool is extremely valuable for saving time spent running experiments,” said Bucci. “For example, instead of running 50-60 reactions, we are now able to run 5-10, potentially saving weeks or months. Each reaction component we test in the lab needs to either be purchased or made from scratch – this tool greatly cuts the amount of money I would typically spend on materials.”

Potential impact on the pharmaceutical industry

Beyond the specific nickel-based reactions examined in the study, the researchers say the workflow could be applied broadly across chemistry and may even help scientists deepen their understanding of chemical processes.

We can learn something about the chemistry from the predictions even if they’re off.

“One of the nice things about the workflow is – it’s not a black box,” said Doyle. “We can learn something about the chemistry from the predictions even if they’re off. We apply our chemistry expertise to help learn something we wouldn't have learned without the tool.”

The pharmaceutical industry could benefit particularly quickly from this system, according to Sigman. Drug developers often need to adapt known reactions to produce large quantities of specific compounds that are needed clinical trials.

“This is where this tool could be highly applicable,” Sigman said. “Optimising a reaction and the time-cost is the value proposition when you build a drug. This streamlined process could make the difference when they need to take a molecule from Phase I to Phase II.”