Artificial intelligence in drug discovery: how long before we see the real impact?
A new race is well underway involving big pharma and big data companies to see who can most effectively mine the new massive data using artificial intelligence (AI). The aim: reducing costs by using targeted in silico analysis, reducing in vitro and in vivo screening, and reviewing huge quantities of preclinical or clinical image data. A key question still asked is can AI effectively and accurately predict properties of new drug candidates?
All major pharma (AstraZeneca, GSK, Merck, Johnson & Johnson, and Pfizer) are embracing AI. What’s really exciting is that there is a shift beyond machine learning strategies (workhorse tools to free up experts’ review of repetitive data) towards deep learning approaches that make new and unknown connections between the data. Who will win this new data-race isn’t merely determined by how much money is spent, but how effectivity and collaboratively companies will deploy these new data mining tools to their projects.
So will it all be plain sailing? Simply put, no. Risks and opportunities abound. Pharma already has access to vast quantities of data. However, for AI to work and for the developed algorithms to be expanded to new endeavours, requires the tools to be built and trained with sound footings. Bad data, or poorly annotated images or metadata in training sets will influence and ultimately limit the quality and accuracy of any analysis. While it’s easy to say we want to mine terabytes of data, seeking unknown connections and new insights, we face massive computational hurdles. Key to success will be an effective use of modern computational approaches to dimensionally reduce the data using memory-efficient tools such as PCA, accelerated t-SNE, NMF, PLSA and MAF, allowing subsequent clustering and machine learning to tease out useful information.
A key area where we can anticipate rapid uptake of AI is histopathology. This is ideally placed to benefit from deep learning as image analysis and machine learning are already being used both clinically and preclinically.1 At AstraZeneca, for example, deep learning models have been successfully combined with image feature extraction to analyse immunohistochemistry (IHC) stained human breast cancer tissue for expression of HER2.2 We are rapidly moving into a world of integrated multimodal digital pathology where we have an opportunity to incorporate powerful new molecular imaging technologies such as mass spectrometry imaging, multiplex IHC or tissue transcriptomics. Applying AI to interrogate multimodal molecular imaging data from a wide range of imaging platforms and technologies is no simple task.
Professor Josephine Bunch is leading a Cancer Research UK Grand Challenge consortium trying to achieve this to investigate tumour metabolism. “Without machine learning and AI our consortium would not be able to fully explore the truly vast data we are collecting from preclinical and clinical samples as part of the Cancer Research Grand Challenge,” said Prof Bunch. “At the National Physical Laboratory, we strive to apply our expertise in data metrology to the rapidly expanding AI field.”
What else might slow the rise of the AI machines in drug discovery? Getting the right people working on the right project. There is a huge demand for experts in AI and many have been snapped up by the large tech companies. However, for labs looking to expand their in-house capabilities the next generation of data scientists will also be coming from a more divergent background. We can expect to see more scientists with experience in fields such as finance or astrophysics wearing lab coats in the near future.
- Yu K-H, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474. doi: 10.1038/ncomms12474.
- Vandenberghe ME, et al. Relevance of deep learning to facilitate the diagnosis of HER2 status in breast cancer. Sci Rep. 2017;7:45938. doi: 10.1038/srep45938.