Research in biochemistry or molecular medicine often produces huge amounts of data. But as with decoding the human genome, this is only the beginning of the work. Because only by analyzing and connecting this data do scientists get key information. However, interpreting huge mountains of data is not so easy. “Without efficient processes in bio- and medical informatics, it will not be possible to use them,” says Hans-Werner Mewes of the Technical University of Munich.
A gene in the search list
Several tens of thousands of genes determine the structure and function of our body and are therefore often the cause of certain diseases. For example, if someone is predisposed to breast cancer because of a family history of the disease, it is worthwhile for women to search their genome for the BRCA-1 and BRCA-2 risk genes. If they are mutated, the risk of developing breast cancer increases by 60 to 80 percent.
But that’s easier said than done: “A biologist can’t interpret massive amounts of data alone,” says Mewes. This task is therefore performed by intelligent computer networks that can scan the genome for two genes in the shortest possible time and thus enable breast cancer prevention measures to be taken at an early stage.
Being able to test all of a person’s genes so quickly is also useful for identifying specific risk genes. Genetic similarities can be found by computer analyzes of the genomes of people suffering from the same, unknown disease. They are then possible causes of disease and enable therapeutic approaches.
The vast amounts of data generated during genetic analysis can only be used in a meaningful way thanks to computers.
Enlarge image
Artificial intelligence as a diagnostic assistant
In addition to “classic” computer analyses, artificial intelligence such as neural networks is also used in bioinformatics. Such systems are trained with data that has already been evaluated so that they can later make independent decisions and perform analyses. For example, if an AI system needs to learn to recognize breast cancer, it first receives a set of mammogram images in which the tumors are labeled.
“You have to tell the computer which cell in the image is a cancer cell so that it can learn from it and eventually recognize the cancer cells itself,” explains Shadi Albarqouni from the Technical University of Munich. Using these training images, the system then learns to recognize typical cancer structures and distinguish them from healthy tissue. Later, artificial intelligence can independently search mammogram images for these suspicious structures and display them. Some such systems already work as well as human radiologists in diagnosing breast cancer, skin cancer and other tumors.
In a computer game, volunteers can knock down cancer cells – the computer learns from this how to recognize those tumor cells
Enlarge image
A computer game helps train AI
But teaching AI such diagnostic skills can be a time-consuming task. “That’s why Albarkuni and his colleagues developed a computer game that volunteers can use to train adaptive computers.
The game is about killing as many “bad” cancer cells as possible and thus training the computer for tissue analysis.
AI to decipher protein folding
AI’s ability to learn is also used in another area of bioinformatics: protein structure determination.
Knowing the function of proteins is important for understanding diseases and for developing drugs or vaccines. Protein function, in turn, is based on the three-dimensional structure of these biomolecules – but understanding this is unimaginably complex: for a protein 150 amino acids long, for example, there would be 2150 different ways of bending. Therefore, only computer systems can determine the correct 3D structure and thus the function of a protein.
The recently developed AI AlphaFold from Google research center DeepMind, for example, is trained with 170,000 protein sequences and their elucidated structures, so that the AI can recognize regularities in protein folding. If the neural network now receives an unknown amino acid sequence after training, it can assign the protein structure that most closely matches the previously learned rules.
.