Artificial intelligence analyses how viruses evade the immune system

The natural language processing model trained using viral protein sequence data was able to predict promising targets for vaccines against HIV, influenza and coronaviruses.

blur viral particle surrounded by antibodies

Researchers have developed a computer model that can predict which sections of viral surface proteins are more or less likely to mutate in a way that would disguise the virus from the immune system. So far, they have used the system to study and suggest potential vaccine targets for HIV, influenza and SARS-CoV-2 (which causes COVID-19).

One of the problems that has confounded the development of an effective HIV or universal flu vaccine is that these viruses mutate their surface protein very rapidly. As a result, the antibodies produced in response to a vaccine quickly become unable to bind to their intended target and so the immunity provided by the vaccine becomes useless. The process by which viruses adapt their surface proteins to avoid recognition by the immune system is known as viral escape.

To make predictions about which mutations would allow viral escape, the team from MIT, US, trained a natural language processing (NLP) model, which were originally developed to analyse patterns and make suggestions in language, to analyse patterns found in genetic sequences. According to the team, the NLP model was ideally suited to this purpose because some of the rules governing language are analogous to those governing protein structure and function.

When used for linguistic analysis, models are trained to analyse patterns in language, specifically, the frequency with which certain words occur together. The models then make predictions of which words could be used to complete a sentence. The chosen word must be both grammatically correct and have the right meaning.

In the new system, grammar is analogous to the rules that determine whether the protein encoded by a particular sequence is functional or not and semantic meaning is analogous to whether the protein can take on a new shape that helps it evade antibodies. Therefore, training an NLP with genetic sequences allows the model to predict new sequences, which still follow the rules biological rules of protein structure but have a different appearance.

The researchers said, some of the benefits of using NLP models for this application included that they can be trained using only genetic sequence information, which is much easier to obtain than protein structures, and that this training requires a relatively small amount of information – in their study, the researchers used 60,000 HIV sequences, 45,000 influenza sequences and 4,000 coronavirus sequences.

Predicting promising vaccine targets

Once the model was trained, the researchers used it to predict sequences of the coronavirus Spike (S) protein, HIV envelope protein and influenza hemagglutinin (HA) protein that would be more or less likely to generate escape mutations.

The model suggested:

  • The sequences least likely to mutate in influenza were in the stalk of the HA protein. Unfortunately, most people infected with the flu or vaccinated against it do not develop antibodies against the HA stalk.
  • For coronaviruses, a part of the S protein called the S2 subunit is least likely to generate escape mutations.
  • In their studies of HIV, the researchers found that the V1-V2 hypervariable region of the envelope protein has many possible escape mutations, as well as identifying some sequences that would have a lower probability of escape.

The researchers are now working with others to use their model to identify possible targets for cancer vaccines that stimulate the immune system to destroy tumours. They said it could also be used to design small-molecule drugs that might be less likely to provoke treatment resistance, for diseases such as tuberculosis.

“There are so many opportunities and the beautiful thing is all we need is sequence data, which is easy to produce,” concluded Bryan Bryson, one of the senior authors of the paper published in Science, an assistant professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT and Harvard.