Rosetta @ home

from Wikipedia, the free encyclopedia
Rosetta @ home
Rosetta at home logo.gif
Area: biochemistry
Target: Prediction of protein structures
Operator: University of Washington
Country: United States
Platform: BOINC
Website: boinc.bakerlab.org/rosetta
Project status
Status: active
Start: September 16, 2005
The End: still active

Rosetta @ home is a non-commercial volunteer computing project that uses distributed computing to attempt to predict protein structures and protein bonds from an amino acid sequence .

In the process, algorithms are developed and tested that enable reliable structure prediction. Accurate prediction of protein structures could prove very useful in the development of cures for AIDS , cancer , malaria , Alzheimer's and viral diseases, for example .

The computer program used is developed in the BakerLab at the University of Washington under the direction of David Baker .

The project was officially launched on September 16, 2005. The basis of the calculations is the BOINC software from the University of California, Berkeley .

The project is active on over 44,000 computers and has a current computing power of around 1200 TeraFLOPS (as of March 2020), which can fluctuate depending on the daily performance. The project experienced a disproportionate increase in active computers during the COVID-19 pandemic , when u. a. the Canadian government provided hundreds of ProLiant devices.

Background, scientific relevance and possible applications

Proteins are the most important function carriers in the body. These are long chains of amino acids condensed with one another . Biologists and biochemists have known for about 40 years that the form a protein takes in the living cell is primarily determined by the sequence and type of amino acids that it contains. This shape, in turn, determines which function this protein can perform.

Which proteins the body can produce is determined in the genome , the DNA , which was completely mapped in the course of the human genome project . In principle, the amino acid sequences of all proteins in the body are known. Theoretically, it should therefore be possible to deduce the shape of these proteins from their sequence and thus determine their function.

The calculation of the structure of CASP6 target T0281 was the first ab initio structure prediction that achieved an atomic resolution. The structure calculated by Rosetta (magenta) is shown here in comparison with an empirically determined crystal structure (blue).

Until recently, the best methods for determining protein structure are crystal structure analysis and nuclear magnetic resonance . However, both are extremely time-consuming and costly, not error-free and not (yet) possible for some proteins. Therefore one tries to predict the protein structure arithmetically based on the amino acid sequence. The idea behind this is that out of all conceivable structures, precisely the one with the lowest energy will also be the structure that a protein takes in nature.

The problem with this is the enormous number of different structures that a chain of amino acids can form: it increases exponentially with the number of amino acids. However, many proteins consist of hundreds or thousands of amino acids. So there is no point in trying out all possible structures, as the probability of finding the right structure is extremely low.

The strategy of the Rosetta software is to develop the structures of short sections of proteins from known proteins with amino acid sequences that are identical in sections and then to connect these short sections and the sequences in between. Then random spatial arrangements of these sections are generated and their energy is calculated. This happens in two phases, the “jump phase”, in which large sections are moved, and a subsequent “relaxation phase” in which the structure with the lowest energy from the jump phase is only minimally changed to slowly move to the lowest point in the “energy landscape “That surrounds the original model.

Each computer involved creates several (a few to a few hundred, depending on the computing power and protein size) randomly selected models for each starting molecule and then goes through the phases mentioned above. Each such attempt corresponds roughly to the procedure of looking for the lowest point anywhere on a map and, for example, slowly working your way along streams or paths. You will only ever find the deepest point in a certain environment. Only if you repeat this procedure frequently in different places will you have found the deepest point on the map with a high probability. In the end, a structure with the absolutely lowest energy in the investigated environment is found on each computer for each molecule that is transmitted and is transmitted to the project. Of all the structures transmitted, the one with the absolutely lowest energy is most likely the one that best corresponds to the natural arrangement. Each participant has, so to speak, scoured one or more individual maps from a large collection of maps of a much larger total area and the project only receives the location of the absolutely lowest point in this area for each part of the map.

The aim of Rosetta is to be able to predict the correct structure not only frequently, but always and to do so with a high degree of accuracy when it comes to the arrangement of the individual atoms. Only then can the structure of the protein be used to reliably determine the function of the protein. In addition to Rosetta, there are a number of other computer programs that try to predict the structure of proteins based on the amino acid sequence. However, there is still no algorithm that can reliably calculate this with a reasonable amount of effort. Rosetta @ home tests various algorithms to enable reliable predictions.

A successful structure prediction would make it possible, in addition to determining the structure of natural proteins, to artificially produce proteins with a very specific shape and thus function. This technique is called protein design . It would enable groundbreaking opportunities in the fight against many diseases such as AIDS , cancer , Alzheimer's, etc. A number of diseases arise e.g. For example, because proteins do not fold into their actual, natural shape, Alzheimer's is an example of this: proteins that should actually occur individually suddenly clump together to form so-called amyloid plaques and disrupt the function of our brain.

Another example is virus infections: Viruses penetrate our cells and then hijack their protein factories. They let the cells make thousands of copies of the viral proteins and genetic material, which assemble into new viruses, from which the cell eventually dies. Then many thousands of new viruses are released in the body, which in turn infect new cells.

But if you could block central virus proteins with the help of precisely fitting, small proteins, the infection would also be stopped. One could e.g. B. prevent the formation of the virus envelope or even the reading of the viral genetic material by the human cells. This is exactly what protein design aims at: Particularly suitable points of attack in the genome or on the proteins of the viruses are to be identified and blocked by specifically developed molecules.

Research competition for protein structure prediction

From May to August 2008, Rosetta @ home participated in the biennial competition for protein structure prediction CASP . Baker has already participated in earlier editions of this competitor with the Rosetta software and has proven that Rosetta is one of the best predictive tools for determining protein structure. In CASP 8 it was also possible to show that with sufficient computing power, a fairly reliable prediction of small to medium-sized proteins is possible. Rosetta @ home will probably also count in the next CASP competitions in order to compare the quality of the forecasts with those of the other participants.

Baker Lab

The Baker Laboratory is located at the University of Washington .

The lead scientist is David Baker, professor of biochemistry at the University of Washington and researcher at the Howard Hughes Medical Institute , who was elected a member of the United States National Academy of Science in April 2006 .

See also

literature

  • SJ Fleishman, TA Whitehead, DC Ekiert, C. Dreyfus, JE Corn, E.-M. Strauch, IA Wilson, D. Baker: Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. In: Science. Volume 332, Number 6031, May 2011, pp. 816-821, ISSN  1095-9203 . doi: 10.1126 / science.1202617 . PMID 21566186 . PMC 3164876 (free full text).
  • S. Raman, OF Lange, P. Rossi, M. Tyka, X. Wang, J. Aramini, G. Liu, TA Ramelot, A. Eletsky, T. Szyperski, MA Kennedy, J. Prestegard, GT Montelione, D. Baker: NMR Structure Determination for Larger Proteins Using Backbone-Only Data. In: Science. Volume 327, Number 5968, February 2010, pp. 1014-1018, ISSN  1095-9203 . doi: 10.1126 / science.1183649 . PMID 20133520 . PMC 2909653 (free full text).
  • Andrew Leaver-Fay, Michael Tyka, Steven M. Lewis, Oliver F. Lange, James Thompson, Ron Jacak, Kristian Kaufman, David Baker and others: ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. In: Methods in Enzymology. Number 487, 2011, pp. 545-574, ISSN  0076-6879 . doi: 10.1016 / B978-0-12-381270-4.00019-6 . PMID 21187238 .

Web links

Individual evidence

  1. [1] BOINCstats
  2. boinc.bakerlab.org/rosetta Rosetta @ home
  3. Oliver Peckham: Rosetta @ home Rallies a Legion of Computers Against the Coronavirus