article

Unlocking the power of machine learning for small molecule drug discovery

Rick Wagner of ZebiAI and Patrick Riley of Google Accelerated Science (GAS) discuss the development and benefits of a new machine learning drug discovery platform.

Machine learning for drug discovery

A collaborative study between ZebiAI, Google Accelerated Science (GAS) and X-Chem has used the power of machine learning to improve the drug discovery process.

The paper, published in the Journal of Medicinal Chemistry, describes an effective machine learning platform with the ability to accelerate drug discovery based on DNA-encoded small molecule library (DEL) selection data. According to the researchers, their findings demonstrate the efficacy of the programme to predict highly potent small molecule inhibitors within a virtual library of compounds across three diverse protein targets.

“We envision artificial intelligence (AI) and machine learning will be a leading source of novel, small molecule drug candidates. These technologies will become indispensable as a means for leveraging large datasets to understand disease biology and identify the best candidates to address intractable diseases,” said Founder and Director of ZebiAI, Rick Wagner, when speaking to Drug Target Review.

How the platform works

Medicine created using AIAccording to the researchers, every small molecule in the library has a unique DNA barcode attached to it, allowing the molecules to be easily catalogued. The library is then used to find which small molecules bind to proteins of interest, by mixing the DEL molecules and proteins. DNA sequencing methods are subsequently used to determine the DNA barcode of the molecules that are bound to the protein target, therefore identifying the compounds.

Data on the thousands of molecules that bind to a protein target in a DEL screen provide a chemical imprint of the target. This makes it possible to derive a machine learning model that can predict active compounds from virtual libraries to the protein of interest, opening up unlimited chemical space.

The researchers highlight that currently, there are not enough small molecule probes available for drug discovery, with only an estimated four percent of the human proteome having a usable probe. Most screening methods are limited by the scope of chemical space to which they provide access. However, DELs combined with machine learning present a new solution.

Therefore, broader and deeper study of the biology of intractable diseases using this approach will accelerate the discovery of novel therapeutics, ultimately improving human health.

The new paper also details the identification of active compounds outside of the DEL library which are structurally different from the molecules used in training. The researchers say these results indicate that, at least for certain targets, machine learning applied to DEL data enables access to unlimited chemical space in a time- and cost-effective manner.

The benefits of the model

Speaking to Drug Target Review, Principal Software Engineer at Google, Patrick Riley, said: “What we have shown is that the combination of physical screening data from a high quality DEL allows you to build a surprisingly effective virtual screening model. This allows a much more cost-effective way to search through chemical space. When combined with ever increasing low costs and on demand chemical libraries, you have a much cheaper way to find hits across a larger chemical space. Those hits are great starting points for chemical probes or further drug discovery efforts.”

Pills with lights on blue“Our machine learning approach allows for the discovery of complex patterns that would be difficult to impossible for a scientist to detect by direct examination of hundreds of millions of data points derived from DEL selection data. By generating models of molecules that bind targets of interest, our technology can extrapolate data significantly beyond the chemistry in the DEL and provide insight into molecules that have specific properties, are easily synthesised or are procured at little expense,” added Wagner.

The Chemome Initiative

With this process established, ZebiAI and GAS have formed the ‘Chemome Initiative’ programme, allowing them to apply their platform.

According to the companies, they will develop chemical probe molecules for the academic community across thousands of novel targets, driving deeper understanding of the biology of intractable diseases.

“The Chemome Initiative will transform our understanding of biology by rapidly providing chemical probes to academic researchers exploring new targets. Access to high quality probes will allow scientists to test new hypotheses in the biological system of their choice. We expect that these initial results will, in some cases, lead to therapeutic hypotheses that will drive new drug discovery programmes,” said Wagner.

“We are excited about this work not just because of the technical interest, but because of the Chemome Initiative. Working with ZebiAI to use this technology for chemical probes means that we can have a broad impact on the early biological research process and we are looking forward to seeing that through,” commented Riley.

However, the researchers highlight that there is the challenge of expansion to ensure the broadest scope of target proteins. They say the key issue will be to establish biological test systems across, ultimately, thousands of protein targets.

“We continue to reach out to new academic institutions and foundations to build our network to access a broadening range of protein targets, assay capabilities and expertise. We have learned a great deal from our partnership with the Structural Genomics Consortium and continue to refine our approach to projects and enhance the technology to drive the impact of the Chemome efforts,” commented Wagner.

Conclusion

Machine learning drug discovery“Many in the field are talking about the use of AI in drug discovery and we think this is a great trend. It is important that we focus on high quality evaluations of new AI approaches in drug discovery like we did in this work. As the community utilises this kind of evaluation and learns where AI makes a meaningful difference, I believe we will see further adoption of AI in the right places,” said Riley.

“We see the expanded value of these models in providing predictions for hit-to-lead and lead optimisation. We believe these predictive models in combination with traditional medicinal chemistry and computational approaches will accelerate the drug discovery process to find safer, more effective therapeutics,” summarised Wagner.