Artificial intelligence-aided screening could boost speed of new drug discovery

Garibay, Ozlem Ozmen; Tayebi, Aida

Artificial intelligence-aided screening could boost speed of new drug discovery

4

SHARES

Share via

Posted: 15 December 2022 | Aida Tayebi (University of Central Florida), Dr Ozlem Ozmen Garibay (University of Central Florida) | No comments yet

Using a natural language-inspired technique, researchers at the University of Central Florida, US, developed an interpretable and generalisable drug target interaction model that achieves 97 percent accuracy in identifying drug candidates for a broad variety of target proteins. Here, Dr Ozlem Ozmen Garibay and Aida Tayebi, who worked on the study, outline their work and how their findings could shape drug discovery.

Medical technology concept. Remote medicine. Electronic medical record.

Drug target interaction (DTI) prediction tasks performed in vitro can be expensive and time consuming. In silico approaches have been used to reduce both cost and time to discover drugs virtually by screening previously known drugs for new treatments and new purposes. This is also known as drug repurposing. Virtual screening reduces the vast molecular interaction landscape to focus the further discovery on potentially promising candidate drugs. Additionally, it can also accelerate the drug discovery process for a new target and disease by repurposing previously known drugs that have already passed clinical trial studies for their effectiveness, safety and side effects and are therefore approved by the US Food and Drug Administration (FDA). Computational screening narrows the list of candidate drugs for further in vitro and in-lab experiments.

A new artificial intelligence (AI)-based DTI model developed by researchers at the University of Central Florida, has sped up the drug screening process against the COVID-19 virus. This research, published in Briefing in Bioinformatics,¹ was conducted through an interdisciplinary collaboration between computer scientists and material scientists. This model, known as AttentionSiteDTI, is inspired by models developed for sentence classification in the field of natural language processing (NLP). It is also the first model that uses the pair of drug and target as a biochemical sentence, with relational meaning between protein pockets and drug molecules which is the key to capture the most valuable contextual semantic or relational information of the sentence. Furthermore, the AttentionSiteDTI model enables an end-to-end graph convolutional neural network model that learns embeddings from the graphs of small molecules and proteins which are not fixed and are sensitive to context similar in NLP.

The researchers outperformed other state‑of‑the‑art studies in predicting the interaction between drug and target and have identified candidates by using deep learning with a self‑attention mechanism to extract the features that rule the most in the complex interaction. They have proved high interpretability through the self-attention mechanism by focusing on the most important parts of the protein interacting with the drug compounds (binding sites); for example, those that contribute the most towards the interaction and high generalisability through the protein input representation that uses protein pockets in the form of graphs.

This is a critical step in the design and development of new drugs to know which biological properties of the compound governs the interaction. According to the study, a benefit of utilising graph convolutional networks is their robustness to different orientations of the three‑dimensional (3D) structures of proteins, however a drawback to this is to find high-quality 3D protein structure.

In this study, the 3D protein structures were extracted from the protein data bank (PDB) which provides all the experimental methods such as nuclear magnetic resonance (NMR), X-ray diffraction and cryogenic-electron microscopy (cryoEM). The binding sites were extracted through a docking-based model which was previously studied. This method provides bounding box co‑ordination for each binding site of a protein. Next, they are used to convert the protein structure to a set of peptide fragments. Then the graph of protein is constructed by each atom acting as a node and the connections between atoms acting as edges. The feature vector of each atom, one‑hot encoding of atom type, atom degree, total number of hydrogen atoms and implicit valence of the atom are also reported in the form of a vector. The Simplified Molecular-Input Line-Entry system (SMILE) of the drug compounds were also represented in the form of graphs in a way that each atom in the small molecule is represented as a node of the graph and the connections between them are represented as edges. In addition, the graph’s atom features using one-hot encoding of atom type, atom degree, formal charge of the atom, number of radical electrons of the atom, the atom’s hybridisation, atom’s aromaticity and number of total hydrogens of the atom are also reported in the form of a vector.

One-dimensional representation is insufficient for complex interactions, particularly for proteins, which are much larger and more complex molecules than drugs. The improved performance of this model is due to the use of graph representations, which are an advanced feature representation and can significantly affect the model’s performance in capturing the structural information of molecules. According to this study, traditional machine learning and deep learning methods that use string representations cannot learn complex non‑linear relationships in drug target interaction. The self‑attention mechanism aids the AttentionSiteDTI model to extract the features automatically and to learn higher order non-linear relationships. The team used three benchmark datasets, DUD-E, Human and BindingDB, to compare the new model with state‑of-the-art graph-based models. AttentionSiteDTI performs comparably well against the state-of-the-art DTI prediction models when using a target protein that the prediction models are trained on. However, when the target protein is changed to another that the models have not been trained on, the performance of AttentionSiteDTI remains robust while the performance of the other models decreases significantly, which indicates a greater degree of generalisability achieved by the new model. This is important because it highlights the AttentionSiteDTI model can be used for a broad variety of protein targets with high performance.
This study is significant since it will assist other researchers to accelerate the drug design by identifying the binding sites’ functional properties. Drug designers can use AI and quickly act in response to new diseases and pandemics such as COVID-19, focusing on the most important binding sites of the virus’s protein. They are able to screen many variations of the protein and small molecules using AI to get accurate predictions of the binding before doing any laboratory experiments.

Furthermore, the team evaluated the binding between spike protein (along with ACE2 protein) of the SARS-CoV-2 virus and the seven candidate compounds (N-acetyl-neuraminic acid, 3α,6α‑Mannopentaose, N-glycolylneuraminic acid, 2-Keto3-deoxyoctonate, N-acetyllactosamine, cytidine5- monophospho-N-acetylneuraminic acid sodium salt and Darunavir) using a binding inhibition assay kit. The strength of the interaction was measured through laboratory experiments in the form of IC50 (half maximal inhibitory concentration) between the pair of drug and target. In this study, candidate molecules were used as inhibitors of the spike protein-ACE2 complex formation. The activity threshold was set at 15nM to identify the best compounds. This evaluation and comparison proved high agreement between computational prediction and experiment results. This shows the potential of the AttentionSiteDTI model in providing the drug designers with an effective tool to pre-screen small molecules in drug repurposing applications for the current pandemic, as drugs to treat COVID are still of interest and to be prepared for future possible pandemics.

About the authors

Dr Ozlem Ozmen Garibay is an Assistant Professor of Industrial Engineering and Management System at the University of Central Florida where she directs the Human‑Centered Artificial Intelligence Research Lab (Human‑CAIR Lab). Prior to that, she served as the Director of Research Technology. Her areas of research are big data, social media analysis, social cybersecurity, artificial social intelligence, human‑machine teams, social and economic networks, network science, STEM education analytics, higher education economic impact and engagement, artificial intelligence, evolutionary computation and complex systems.

Aida Tayebi is a second year PhD student at University of Central Florida. Her current research interests include Algorithmic Fairness and bias mitigation techniques in DTI.

Reference

Yazdani-Jahromi M, Yousefi N, Tayebi A, et al. AttentionSiteDTI: An interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Briefings in Bioinformatics. 2022;23(4).

Related conditions
Covid-19

Related organisations
US Food and Drug Administration (FDA)

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

Artificial intelligence-aided screening could boost speed of new drug discovery

About the authors

Reference

Leave a Reply Cancel reply

Recommended

Artificial intelligence-aided screening could boost speed of new drug discovery

About the authors

Reference

Ion Channel Drug Discovery – Recent Advances in Novel Non-Opioid Pain Research

The power of big data to advance genomics into clinical care

Translating ‘nature’s cues’ into breakthrough immunotherapies

New model offers a unique method to study Parkinson’s disease

Part three: pragmatic guidelines to getting the best out of LLMs

Leave a Reply Cancel reply