GC content

from Wikipedia, the free encyclopedia
Schematic representation of base pairs in the DNA double strand - the hydrogen bonds between AT and GC pairs are shown in dashed lines. In this example the GC content is 50%.

The GC content is a characteristic of nucleic acid molecules such as DNA or RNA and indicates the proportion of guanine (G) and cytosine (C) in the totality of the nucleic bases contained in percent.

For a DNA molecule, the measure refers to the four DNA bases guanine, cytosine, adenine (A) and thymine (T):

Since only these four bases are usually found in a DNA strand, the AT content can be calculated from the GC content and vice versa:

A GC content of 64%, for example, thus corresponds to an AT content of 36%.

GC content and stability of the DNA double helix

Spatial model of a DNA double helix - hydrogen bonds are formed between matching base pairs in the double strand

The complementary bases in each case are linked to one another in the double-stranded DNA molecule via hydrogen bonds : AT (or TA) and GC (or CG). The pairs of adenine and thymine always form two hydrogen bonds, the pairs of guanine and cytosine three. Contrary to what was initially assumed, the energy gain through hydrogen bonds is negligible, since the bases can form hydrogen bonds with the surrounding water that are of similar quality. The hydrogen bonds of a GC base pair contribute little to the stability of the double helix, while the hydrogen bonds of an AT base pair even have a destabilizing effect.

In contrast, the stacking interactions between the successive or superimposed bases in the double helix have a stabilizing effect : a dipole-induced dipole interaction occurs between the aromatic ring systems of the heterocyclic bases , which is energetically favorable. Thus, the formation of the first base pair is quite unfavorable due to the low energy gain and the loss of entropy , but the lengthening of the helix (elongation) is energetically favorable because of the difference in Gibbs energy when paired bases are stacked.

However, the stacking interactions are sequence-dependent and energetically most favorable for stacked GC pairs, while they are less favorable for stacked AT pairs. The differences in the stacking interactions mainly explain the fact that GC-rich DNA segments are thermodynamically more stable than AT-rich sequences, while hydrogen bond formation plays a subordinate role here.

Determination of the GC content

The GC content of DNA can be determined experimentally using various methods. The easiest way is to measure the so-called melting temperature ( T m value ) of the DNA double helix using a photometer : The DNA absorbs ultraviolet light at a wavelength of 260 nm. The double strand denatures ("melts") when heated into two single strands, light absorption increases by about 40%. This effect is known as hyperchromicity .
The T m value is defined as the temperature at which 50% of the double helix are in the denatured state. The T m value of a DNA double strand is directly dependent on its GC content. The more GC bonds a DNA molecule contains, the higher the T m value. Instead of the original double helix, there are now two singular polynucleotide chains (two single strands). With the photometrically determined melting temperature, the GC content can be calculated using the empirical formula (Tm [° C] - 69.4 ° C) × 2.44 .

This value also depends on the ionic strength and the type of ions present in the DNA solvent. Therefore the melting temperature in standard saline citrate has to be determined.

The determination of the GC content using gas chromatography is much more precise . If, on the other hand, the sequence of the DNA molecule is known, the GC content can simply be calculated using the formula given above.

GC content and taxonomy

The GC content in the genome is used as a taxonomic characteristic to classify organisms, especially bacteria . The values ​​here range from approx. 20 to almost 80%. Bacteria with a high GC content are mainly found among the Actinobacteria , but delta proteobacteria such as myxobacteria are also GC-rich. Thermophilic organisms also have increased GC contents, which is certainly due to the greater stability of the GC base pairing.

GC contents of some model organisms :

Art Phylogenetic group GC content
Streptomyces coelicolor
Myxococcus xanthus
Halobacterium sp.
Saccharomyces cerevisiae (baker's yeast)
Arabidopsis thaliana (
thale cress ) Methanosphaera stadtmanae
Plasmodium falciparum (malaria pathogen)
Actinobacterium
Deltaproteobacterium
Archaeon
Ascomycet (fungus)
Flowering plant
Archaeon
Protozoon
72%
68%
67%
38%
36%
27%
≈20%

For comparison: the average GC content in humans is 41% (see also CpG island ). Due to the structure of the genetic code, it is practically impossible for an organism to construct its genome exclusively from two bases (GC or AT) and thus to achieve a GC content of 100% or 0%. The number of possible codons (8) is not sufficient to encode all amino acids (20) in a two-base code.

GC contents of individual DNA segments

The proportion of base pairs GC and AT also varies within a genome.

AT-rich (and therefore GC-poor) regions are often found in the genome at those points where the double helix must be easily resolvable, for example at the points where the replication of the DNA molecule begins. There are also regions in human chromosomes with GC contents that deviate significantly from 50%. These sections are mostly involved in maintaining the spatial structure of the chromosomes.

In addition, the GC content in the DNA segments that code for a gene is often higher than in other regions (for example introns , regulatory sequences). This property is used to search for the actual genes in sequenced genomes : Genome sequences initially consist exclusively of a sequence of millions of bases. The actual genes (that is, their start and end points in the genome) are annotated with the help of computer programs (e.g. GLIMMER) that find GC-rich sections and identify them as possible genes.

If, when studying an organism, one encounters functional genes whose GC content differs significantly from that of the other genes, this is often interpreted as an indication that these genes were only recently acquired by horizontal gene transfer or come from a retrovirus .

See also

Individual evidence

  1. JD Watson, FH Crick: Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. In: Nature . Vol. 171, 1953, No. 4356, pp. 737-738, PMID 13054692 , PDF
  2. a b Peter Yakovchuk, Ekaterina Protozanova, Maxim D. Frank-Kamenetskii: Base-stacking and base-pairing contributions into thermal stability of the DNA double helix . In: Nucleic Acids Research , 2006 34 (2), pp. 564-574, PMID 16449200 , doi: 10.1093 / nar / gkj454 .
  3. J. De Ley: Reexamination of the Association Between Melting Point, Buoyant density, and Chemical Base Composition of Deoxyribonucleic Acid . In: J. Bact. , 101 (3), 1970, pp. 738-754.