CpG island

from Wikipedia, the free encyclopedia

CpG islands (English. CpG islands , abbreviated CGIs , sometimes also referred to as CG islands or CG islands ) are regions in the genome of eukaryotes with a statistically increased CpG dinucleotide density. This density is related to the individual nucleotide and dinucleotide frequencies in the entire genome section under consideration.

"CpG" refers to a two bases - sequence motif . The "p" is often included, for example. B. better to distinguish between the C G meant here within a DNA strand and the CG base pairing of a DNA double strand (see CpG site ).

Typical definitions for a CpG island require a genome section of at least 400 to 500 bp in length, which has an average G + C content of at least 50% and in which a CpG ratio (observed to be expected) of at least 60% is present. The GC content of the total human genome is, for example, approximately 42% and is therefore significantly lower than that in the CpG islands.

CpG islands are created by mechanisms that have to do with the use of genetic material as an information carrier. As a result, CpG islands are important markings that e.g. B. for genetics , medicine and bioinformatics are important.

They are not to be confused with the GC box , which is 60–100 bp before the start of the transcript .

properties

In a CpG island there is a CpG site every 10 nucleotides (frequency approx. 1:10), shown here using the example of a gene promoter area with highlighted ATG as the start codon (left). Otherwise a CpG site occurs every 100 nucleotides (frequency approx. 1: 100), shown here using the example of a "normal" genome section that is usually methylated (right).

In mammals , depending on the species, between 2% and 7% of the cytosines in a cell are methylated . About 70 to 85% of the CpG dinucleotides in mammals are methylated, while CpG islands are predominantly unmethylated, which epigenetically regulates gene expression . About 5% of the CpG dinucleotides are in one of the 20,000 CpG islands in mammalian genomes. Half of the CpG islands are in household genes in mammals . About 40% of the promoters in mammals have a CpG island.

Usually it is the cytosines from 5'-CpG-3 'dinucleotides that carry a methyl group on both complementary DNA strands, creating a palindromic methylation pattern. If two cytosines are methylated in this constellation, they together cause a change in the three-dimensional structure in the large groove of the double-stranded DNA.

The average GC content in humans is 42%, which means that the dinucleotide CpG should theoretically be present in the genome with a frequency of around 4%. In fact, however, CpG dinucleotides are strongly underrepresented with 0.8%, which is mainly due to the relatively spontaneous reaction of 5-methylcytosine to thymine through deamination (see explanation and figure below). This means that the CpG dinucleotide density in CpG islets is 10-20 times higher than in other areas of the average genome of vertebrates . Compared to other dinucleotides, such as GpC, ApT or TpA, the CpG dinucleotide has a special position in many eukaryotic organisms because its frequency defines the CpG islands.

Functions of CpG islands

Since their discovery, CGIs have been associated with a variety of fundamental processes, including these three functions:

  • DNA replication ; CGIs can act as an origin of replication; the sequences themselves are possibly genomic footprints left on the chromosome by replication events.
  • Embossing ( imprinting ); CGIs can be methylated differently depending on the allele.
  • Transcriptional regulation; CGIs function primarily as the sites for the recruitment of RNA Pol II and the initiation of transcription.

In the third function, transcriptional gene regulation , CpG islands play a major role. They are often found in vertebrates near promoters , especially in household genes .

The methylation of CpG sites within a CpG island usually means that the assigned gene is not read. About 40-45% of all human genes have CpG islands in their promoter regions.

Methylation of CpG islands plays a role both in the development of cancer (as a mechanism for switching off tumor suppressor genes ) and in genomic imprinting . In tumors there is often a general under-methylation of the cytosines in CpG dinucleotides and an over-methylation in CpG islands of certain tumor suppressor genes .

CG suppression and formation of the CpG islands

The two cytosines in a CpG site (DNA double strand) are mostly methylated in the human genome ( DNA methylation ). In some areas, methylation is permanently suppressed. These areas are often CpG islands and are often located in front of genes (the so-called promoter areas). The methylated CpG sites are exposed to a mutation pressure, which is named by " CG suppression " and is described below:

Cytosines can undergo deamination in the cell (-NH 2 becomes = O). Hydrolytic deamination of bases can occur without a catalyst, but it can also be induced enzymatically. Methylated cytosine becomes thymine, unmethylated cytosine (e.g. in the CpG islands) becomes uracil. While thymidine is a "normal" nucleobase in DNA, uracil does not belong in DNA. Uracil - actually an RNA base - is recognized very well and replaced by cytosine. The DNA repair mechanisms of the cell take the guanosine present on the opposite DNA strand as the basis for error correction. In the methylated CpG dinucleotides, however, thymine is formed as a result of deamination. This “mistake” is tolerated much more frequently than uracil and leads to permanent mutation. The uracil-DNA-glycosylases that make up a significant difference in efficiency are those (e.g.) that can cut out uracil (base excision) but cannot be used on incorrectly formed thymine.

The following diagram shows the possible mutations through deamination and the consequences through repairing the DNA or through permanent establishment of mutations.

                   1.                   2.                        3.
                                                                |
     Methyliert:                                                |
       m                                                        |     m
a)   --CpG--  Desaminierung  --TpG--  häufig       --CpG--      | → --CpG--
     --GpC--                 --GpC--               --GpC--      |   --GpC--
         m                       m                     m        |       m
                                                                |
                                                                |
b)                                    selten       --TpG--      | → --TpG--
                                                   --ApC--      |   --ApC--
                                                       m        |
     Unmethyliert:                                              |
                                                                |
c)   --CpG--  Desaminierung  --UpG--  sehr häufig  --CpG--      |
     --GpC--                 --GpC--               --GpC--      |
                                                                |
                                                                |
                                                                |
d)                                    sehr selten  --UpG--      | → --TpG--
                                                   --ApC--      |   --ApC--
                                                                |

Legend to the scheme: Two CpG sites are shown, one of which is in a methylated area [ a) and b) ], while the other is in an unmethylated area - e.g. B. a CpG island - is localized [ c) and d) ]. The "conspicuous" nucleobases are highlighted in bold.

1. Deamination leads to a new base, so that the complementary base pairing at this base position (marked in bold) is canceled.

2. There are two variants available for the subsequent restoration of the complementary base pairing, each of which has a different probability. The difference between a) and b) with often and seldom arises from the fact that the opposite strand shows methylation of the CpG. As a result, this strand is understood by the DNA repair system as an “older”, conserved strand in this area. The bigger difference between c) and d) with very often and very rarely is due to the fact that uracil is not a DNA base.

3. Following the mutative events, incorrect methylations or nucleobases may be replaced.

Bioinformatic Analysis

Different algorithms for the identification of CpG islands have been described.

Finding CpG islands with the help of Markov chains

Designates the number of st pairs on CpG islands and otherwise (not CpG islands) . The transition probabilities are calculated using the maximum likelihood : and The determination is based on sequence sections, of which one knows whether they are CpG islands or not. Let us now assume an unknown sequence X. Question: "Is it a CpG island?" Designations:

  • P (+ | X) probability that X is CpG island
  • P (- | X) Probability that X is not a CpG island

In addition, a score function is defined:

The total length of all CpG islands relative to the total length of the genome is used as the "prior".

Finding CpG islands using the hidden Markov model

The bases (G, C, A, T) at the respective positions in the DNA sequence are referred to as visible states. The non-visible state says something about whether this base is part of a CpG island or not (+, -). There are 4 possible transition probabilities:

.

Every hidden state s creates a visible state b (a base) with an emission probability:

The probability that a visible state was emitted by a hidden state results from:

with: (see Markow chain )

This results in:

Since the effort to maximize P (Z | X) increases exponentially with the length of the sequence, the recursive Viterbi algorithm is suitable for solving the problem.

Individual evidence

  1. ^ RS Illingworth, AP Bird: CpG islands - 'a rough guide'. In: FEBS letters. Volume 583, Number 11, June 2009, pp. 1713-1720, doi : 10.1016 / j.febslet.2009.04.012 , PMID 19376112 (review).
  2. a b ES Lander, LM Linton a. a .: Initial sequencing and analysis of the human genome. In: Nature . Volume 409, Number 6822, February 2001, pp. 860-921, doi : 10.1038 / 35057062 , PMID 11237011 .
  3. K. Jabbari, G. Bernardi: Cytosine methylation and CpG, TpG (CpA) and TpA frequencies. In: Genes. Volume 333, May 2004, pp. 143-149, doi : 10.1016 / j.gene.2004.02.043 , PMID 15177689 .
  4. a b c R. Chatterjee, C. Vinson: CpG methylation recruits sequence specific transcription factors essential for tissue specific gene expression. In: Biochimica et Biophysica Acta . Volume 1819, number 7, July 2012, pp. 763-770, doi : 10.1016 / j.bbagrm.2012.02.014 , PMID 22387149 , PMC 3371161 (free full text).
  5. AM Deaton, A. Bird: CpG islands and the regulation of transcription. In: Genes & development. Volume 25, number 10, May 2011, pp. 1010-1022, doi : 10.1101 / gad.2037511 , PMID 21576262 , PMC 3093116 (free full text).
  6. ^ JA Law, SE Jacobsen: Establishing, maintaining and modifying DNA methylation patterns in plants and animals. In: Nature Reviews Genetics . Volume 11, number 3, March 2010, pp. 204–220, doi : 10.1038 / nrg2719 , PMID 20142834 , PMC 3034103 (free full text).
  7. M. Fatemi, MM Pao, S. Jeong, EN Gal-Yam, G. Egger, DJ Weisenberger, PA Jones: Footprinting of mammalian promoters: use of a CpG DNA methyltransferase revealing nucleosome positions at a single molecule level. In: Nucleic acids research. Volume 33, number 20, 2005, p. E176, doi : 10.1093 / nar / gni180 , PMID 16314307 , PMC 1292996 (free full text).
  8. S. Sarda, S. Hanne Halli: Orphan CpG islands as alternative promoter. In: Transcription. Volume 9, number 3, 2018, pp. 171–176, doi : 10.1080 / 21541264.2017.1373209 , PMID 29099304 , PMC 5927659 (free full text).
  9. ^ F. Antequera, A. Bird: CpG islands as genomic footprints of promoters that are associated with replication origins . In: Current biology: CB . tape 9 , no. 17 , 1999, ISSN  0960-9822 , p. R661-667 , PMID 10508580 .
  10. A. Wutz, OW Smrzka, N. Schweifer, K. Schellander, EF Wagner, DP Barlow: Imprinted expression of the Igf2r gene depends on an intronic CpG island . In: Nature . tape 389 , no. 6652 , 1997, ISSN  0028-0836 , pp. 745-749 , doi : 10.1038 / 39631 , PMID 9338788 .
  11. CS Hoffman, F. Winston: Isolation and characterization of mutants constitutive for expression of the fbp1 gene of Schizosaccharomyces pombe . In: Genetics . tape 124 , no. 4 , 1990, ISSN  0016-6731 , pp. 807-816 , PMID 2157626 , PMC 1203973 (free full text).
  12. S. Saxonov, P. Berg, DL Brutlag: A genome-wide analysis of CpG dinucleotide in the human genome distinguishes two distinct classes of promoter. In: Proceedings of the National Academy of Sciences . Volume 103, number 5, January 2006, pp. 1412-1417, doi : 10.1073 / pnas.0510310103 , PMID 16432200 , PMC 1345710 (free full text).
  13. ^ Rolf Knippers: Molecular Genetics. 9th, completely revised edition. Stuttgart, 2006, p. 340.
  14. D. Sproul, RR Meehan: Genomic insights into cancer-associated aberrant CpG island hypermethylation. In: Briefings in functional genomics. Volume 12, number 3, May 2013, pp. 174–190, doi : 10.1093 / bfgp / els063 , PMID 23341493 , PMC 3662888 (free full text).
  15. MJ Snider, L. Reinhardt, R. Wolfenden, WW Cleland: 15N kinetic isotope effects on uncatalyzed and enzymatic deamination of cytidine. In: Biochemistry. Volume 41, Number 1, January 2002, pp. 415-421, PMID 11772041 .
  16. MJ Snider, R. Wolfenden: Site-bound water and the shortcomings of a less than perfect transition state analogue. In: Biochemistry. Volume 40, Number 38, September 2001, pp. 11364-11371, PMID 11560484 .
  17. N. Schormann, R. Ricciardi, D. Chattopadhyay: Uracil-DNA glycosylases-structural and functional perspectives on an essential family of DNA repair enzymes. In: Protein science: a publication of the Protein Society. Volume 23, number 12, December 2014, pp. 1667–1685, doi : 10.1002 / pro.2554 , PMID 25252105 , PMC 4253808 (free full text) (review).
  18. Z. Zhao, L. Han: CpG islands: algorithms and applications in methylation studies. In: Biochemical and biophysical research communications. Volume 382, ​​number 4, May 2009, pp. 643-645, doi : 10.1016 / j.bbrc.2009.03.076 , PMID 19302978 , PMC 2679166 (free full text).