Scientifically Speaking | AI can now design the most important molecules of life
Can AI hallucinate proteins that work at least as well as the ones life has given us? An article published in Nature Biotechnology demonstrates that for a well-studied class of proteins, it can.
In the court of public opinion, DNA gets all the credit. But whether it’s a virus, a banana, a whale, or a human, the class of biological molecules known as proteins do most of the work. Proteins are workhorses that are made according to specific instructions in the DNA blueprint. And life has had billions of years to create proteins from scratch or repurpose them for every function in kind of life on this planet.
Though proteins are usually made out of only twenty letters known as amino acids, the arrangement of the letters in lengthy sequences of hundreds of letters leads to master molecules with distinct shapes. These three-dimensional shapes are important to how each protein works, so structural biologists have spent a lot of time figuring each out painstakingly.
In 2021, I wrote a column about DeepMind’s AI-based programme, AlphaFold, that predicted the structure of every known protein at the time. It was truly a monumental scientific advancement that promised to speed up research on proteins that weren’t well understood.
But knowing what a protein looks like is half the battle. We want to be able to make new ones that do different things like fight off viruses or degrade plastics.
Using natural proteins as a starting point, scientists have changed some of the amino acids to try to make new proteins that do different their jobs more effectively. Some scientists have also used the iterative power of evolution to directly engineer proteins with newish functions. But we’ve not been able to come up with new proteins from scratch in any meaningful way.
Can AI do a better job at making designer proteins than nature and people? Well, we don’t know that yet, but before we even attempt to answer that question, we need to answer a more fundamental one. Can AI hallucinate proteins that work at least as well as the ones life has given us? A recent research article published in Nature Biotechnology on January 26 demonstrates that for a well-studied class of proteins, it can.
Since last November, ChatGPT, a language learning model developed by OpenAI has taken the world by storm. ChatGPT is a chatbot that takes a text query input and provides a text answer as an output. It’s been trained on a huge dataset and fine-tuned by humans to remove improper answers.
Similarly, natural language processing could also be used to create proteins. In that case, the input would be the name of the protein and the output would be the letters that make up the protein sequence. In fact, this is exactly what researchers led by a team at Salesforce Research has done in their new Nature Biotechnology paper.
The research team describes a new AI programme, ProGen, that can come up with completely new sequences for artificial proteins.
To create ProGen, researchers trained it on 280 million different protein sequences (tagged with their known functions). Then they honed in on one specific protein, a lysozyme which is relatively small in size at about 300 amino acids. And they fine-tuned the ProGen model with 56,000 variants of lysozymes from different lifeforms.
The model created a whopping million different artificial protein sequences, of which the scientists picked 100 that they thought might work based on biological characteristics.
The scientists then made a few artificial lysozymes from the AI-generated sequences and compared them to natural ones. The artificial proteins don’t look quite like natural proteins, but the punchline is that they were actually quite good at the job that they were supposed to be doing.
So, does this mean that we have a ChatGPT-like interface where we can ask an AI program to come up with a completely new protein? Something like a protein that will break down a plastic that isn’t degradable in nature? That would truly be awesome, but we’re not quite there yet. The sequences and training sets all rely on existing proteins of known functions to come up with artificial ones. But it’s a crucial first step and we’re now well on the way to making designer proteins dreamed up by AI.
Anirban Mahapatra is a scientist by training and the author of a book on COVID-19
The views expressed are personal