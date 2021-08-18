When most people think of the molecules of life, they think of DNA and overlook proteins. In a sense, DNA is a manager that gets the limelight, while proteins toil away behind the scene.

Proteins are molecular machines in all living cells. They speed up reactions that make life possible and give form to life. The three-dimensional shape of a protein determines what it does and how well it does its job. It is true for numerous proteins such as haemoglobin, which transports oxygen in our blood and muscle proteins that allow us to move. It is true for antibodies, which help stop viruses and bacteria. It is also true for the proteins that copy DNA and help make more proteins. When certain human proteins fail to fold into their proper shapes, they can give rise to diseases such as Alzheimer’s and Parkinson’s. In addition, many drugs work by attaching to the business end of proteins.

Proteins are made up of building blocks called amino acids. The order of amino acids in a protein is determined by the sequence of letters in the gene that gives rise to it. One of the great achievements in molecular biology in the past century was finding out how the text of four letters embedded in DNA that all living things have gets “translated” into myriad three-dimensional proteins. It was the elucidation of the genetic code that won Robert W. Holley, Har Gobind Khorana and Marshall W. Nirenberg the Nobel Prize in Physiology or Medicine in 1968.

From DNA, we can figure out the order of amino acids in a given protein. But the string of amino acids fold into a precise shape every single time that is hard to predict. For the past 50 years, trying to deduce the structure of a protein from just its amino acid sequence has been elusive. Known as the protein folding problem, it has been one of the “grand challenges” of biology.

In the intervening decades, scientists have relied on “growing” proteins and experimentally figuring out their structures. This has worked well for around 170,000 proteins spanning the gamut of life. But experimentally determining structures is costly and time-consuming. And it doesn’t always work.

For example, the human genome contains over 20,000 protein-coding genes; after decades, we know the shapes of around 17% of the proteins reasonably well.

This creates a problem of scalability. Through rapid and cheap sequencing of DNA, sequences of millions of genes that give rise to proteins are known, but we don’t have a solid grasp of what most of the proteins look like. An analogy is that we have millions of sheets of paper with lines on how to fold them into origami, but no accurate way to know what shapes they fold into.

Now, it looks like DeepMind, an artificial intelligence (AI) laboratory of Alphabet, has mostly solved the protein folding problem. At a conference last year where participants used computers to predict the structure of a protein from its sequence, the program AlphaFold beat the entire field by a wide margin. More impressively, it was able to predict most of the protein structures to greater than the target 90% score (this is pretty similar to high-quality experimental work in a lab).

In a paper published in Nature on July 15, the DeepMind team provided details on how AlphaFold works. AlphaFold used a neural network mimicking the brain. It was fed amino acid sequences that gave rise to known protein structures. Researchers also gave it some basic rules of chemistry and biology to limit the possibilities. And then, the computer made its own associations and “learned” how to predict protein shapes.

On July 22, DeepMind researchers followed up with another report in Nature. They predicted nearly every single protein structure from humans (plus those from 20 other important organisms) using AlphaFold. Anyone with internet access can search for these structures from a public database. This truly is Google for proteins.

A reasonable expectation is that a PhD student might experimentally solve one or two protein structures in five or six years. DeepMind plans to deposit a whopping 100 million structures by the end of the year. It is safe to say that what AlphaFold has achieved is beyond the capability of human intelligence.

Some commentators think that AlphaFold will usher in a new era of drug discovery. It is possible, but pharmaceutical companies already have many proteins that they can target. Often the bottleneck isn’t protein structure, but getting drugs to work in trials and through the regulatory process. For example with the coronavirus, we have structures of many viral proteins, but new drugs that target these proteins have not yet been approved.

This leads me to a second point. Proteins are shapeshifters. Proteins can morph shapes by attaching to chemical compounds, cell membranes, and other proteins. A protein structure is a snapshot in time. AlphaFold cannot account for all of this variation yet, and ultimately for important proteins it might not replace human validation either.

What excites me most and where I think AlphaFold can immediately contribute is in finding out what many of the millions of poorly-studied proteins actually do. There is huge potential here to discover proteins that help in environmental remediation and in biotechnological processes. For example, we could identify proteins in soil bacteria that help to degrade plastics. We could also find proteins that help certain animals live longer or adapt to specific niches. And as the planet warms, we could identify proteins that help crop plants adapt to rising temperatures.

AlphaFold is an exciting step in which AI plays a meaningful role in biology. Over time, we can expect humans and machines to work together in solving many other grand challenges.

Anirban Mahapatra, a microbiologist by training, is the author of COVID-19: Separating Fact From Fiction

The views expressed are personal

