By Christian J Meier
Proteins are now familiar even to many laymen: the corona virus spike is also known as the “spike protein”, and Novavax’s protein vaccine was recently approved in Europe. But proteins are part of everyday life in other ways: in food or in the form of enzymes that control metabolism.
“Proteins, proteins everywhere” is how Holden Thorp titled an editorial in a scientific journal Science. As the magazine’s editor-in-chief, Thorp names the algorithm “The Breakthrough of 2021.” which could soon close a huge knowledge gap in the study of proteins: the three-dimensional shape of most proteins in the animal and plant kingdoms is still unknown, as it takes a long time for them to form to be determined experimentally.
On the other hand, an artificial intelligence called Alphafold calculates 3D structures purely virtually. The software of the British Google sister Deepmind has now done this with all the proteins in the human body. In 2022, AI should complete the 3D forms of all of the approximately 100 million proteins known to biology.
“Alphafold makes our life much easier,” says Christian Löw of the European Laboratory for Molecular Biology in Hamburg. AI acts as a turbocharger for the life sciences, which can now explore proteins faster and, as a result, find new active substances faster.
Why are proteins so versatile? Proteins are chains made up of hundreds of individual components called amino acids. The human body uses 21 different amino acids. This results in countless possible combinations, called “sequences”. As different amino acids attract or repel each other, the chain bends into a complex figure. This three-dimensional shape of a protein determines its function. For example, it forms binding pockets where only very specific active ingredients fit, like a key in a lock. If the key matches, the protein performs its function, for example transporting an anchored nutrient molecule into the cell interior.
The interplay within proteins is so complex that even supercomputers capitulate
To understand function, researchers therefore need to know form as a molecular jack-of-all-trades. But proteins are submicroscopic. Their shape can only be determined using complex methods such as electron microscopy – and not always successfully. On the other hand, the array is easy to analyze. So it makes sense to try to calculate the shape directly from the string using a computer. But the physical and chemical interactions between individual amino acids are so complex that even supercomputers give up.
Alphafold does things differently. The software relies on pattern recognition. To do this, it uses neural networks. By training with images of known content, this type of AI learns to recognize similar image content, such as faces or cars, in new images.
The protein sequences also show patterns that Alphafold uses. Similar sequences form similar 3-D structural elements in different species. The AI learned these and other patterns by training on known protein structures that are openly available in a central database. With great success: Alphafold sometimes predicts unknown shapes with atomic precision.
“Alphafold has revolutionized the way we work,” says biochemist Christian Löw. His team studies a protein called PepT1, which is found in the cell membrane of intestinal cells and transports nutrients and drugs, such as antihypertensive drugs, into the cells. It acts as a dam that first opens outwards, receives the transported goods, and then releases them into the interior of the cell. “We didn’t know the open state from the inside,” says Löw. Alphafold predicted it. The benefit: “If you understand PepT1, you can design drugs in such a way that they are more efficiently absorbed into the bloodstream.” A smaller dose could then have the same effect.
“It used to take a PhD to discover the 3D structure of a protein,” says Löw. Alphafold only took a few hours. “Now we can start with a structural model and get early indications of how the protein works.”
Alphafold can’t even map protein dynamics
Löw’s colleague Jan Kosinski agrees. The researcher examines the complex of hundreds of proteins that form pores in the wall of cell nuclei. Until recently, researchers only knew a small part of the protein, but thanks to Alphafold, the shapes of a large part are now known. “It would normally take up to ten years and employ many researchers,” says Kosinski. Alphafold is transforming the entire field of research. “It dominates the discussions at conferences,” says Kosinski.
Alphafold will not replace experimental methods, emphasizes Loew. Because software has its limitations. The software completely ignores one important biological factor: time. “But proteins are very dynamic,” explains Löw. They deform when they combine with other proteins to form a larger complex or when an active substance binds. Some proteins have “tails” that are flexible “like spaghetti,” Kosinski adds. As a result, Alphafold’s nuclear pore model remains incomplete, the researcher admits.
Another drawback is the fluctuating accuracy with which Alphafold shows the position of individual atoms within a protein. That’s often low in binding pockets, says Christofer Tautermann, head of computational chemistry at pharmaceutical company Boehringer Ingelheim. Tautermann is investigating whether Alphafold can facilitate the long-term search for new active ingredients. “The candidate has to fit perfectly in the pocket,” explains the chemist and mathematician. He’s enthusiastic about Alphafold, Tautermann says, but the software isn’t a magic bullet. But he is confident: “With more training data from experimentally explored binding pockets, AI could predict them with reliable accuracy.”