Breakthrough in AI protein structure and folding prediction

An artificial intelligence (AI) system called AlphaFold has been developed to effectively predict protein structures and folding.


Researchers using an artificial intelligence (AI) system have successfully utilised it to predict protein structures and folding. According to the researchers, this development could significantly accelerate biological research over the long term, revealing new possibilities in disease understanding and drug discovery.

Developed by a team at DeepMind, the AI system – dubbed AlphaFold – can determine highly-accurate structures within days. The results were validated by the independent Critical Assessment of protein Structure Prediction (CASP). 

CASP uses the Global Distance Test (GDT) metric to assess accuracy, ranging from 0-100. They report that the new AlphaFold system achieves a median score of 92.4 GDT overall across all targets.

DeepMind developed new deep learning architectures for CASP14, drawing inspiration from the fields of biology, physics and machine learning, as well as the work of many scientists in the protein folding field over the past half-century.

The researchers say that a folded protein can be thought of as a “spatial graph”, where residues are the nodes and edges connect the residues in close proximity. This graph is important for understanding the physical interactions within proteins, as well as their evolutionary history. For the latest version of AlphaFold used at CASP14, DeepMind created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it is building. It uses evolutionarily related sequences, multiple sequence alignment (MSA) and a representation of amino acid residue pairs to refine this graph.

By iterating this process, the system develops strong predictions of the underlying physical structure of the protein. Additionally, AlphaFold can predict which parts of each predicted protein structure are reliable using an internal confidence measure.

The system was trained on publicly available data consisting of approximately 170,000 protein structures from the protein data bank, using a relatively modest amount of compute by modern machine learning standards.

The AlphaFold team is now looking into how protein structure predictions could contribute to understanding of certain diseases with a few specialist groups.

As with its earlier CASP13 AlphaFold system, DeepMind is planning to submit a paper detailing the workings of this system to a peer-reviewed journal and is simultaneously exploring how best to provide broader access to the system in a scalable way.

Dr Demis Hassabis, Founder and Chief Executive Officer (CEO) of DeepMind said: “The ultimate vision behind DeepMind has always been to build AI and then use it to help further our knowledge about the world around us by accelerating the pace of scientific discovery. For us AlphaFold represents a first proof point for that thesis. This advance is our first major breakthrough in a long-standing grand challenge in science, which we hope will have a big real-world impact on disease understanding and drug discovery.”

“Protein biology is fantastically complex and defies simple characterisation. Our team’s work demonstrates that machine learning techniques are finally able to meet the complexity of describing these incredible protein machines, and we are truly excited to see what new breakthroughs in both human health and fundamental biology it will bring,” said Dr John Jumper, AlphaFold Lead at DeepMind.

Dr Kathryn Tunyasuvunakool, Science Engineer at DeepMind, said: “The ability to predict high accuracy protein structures with AI could change how we approach biology, with potential applications in drug design and bioremediation. Particularly for experimentally challenging proteins, good predictive techniques could make a huge difference.”

More information can be found here