from Wikipedia, the free encyclopedia
Schematic representation of a gene as a section on the double helix of DNA.
Shown is a eukaryotic gene that contains introns and exons , and in the background the DNA double strand condensed to the chromosome (in fact, exons and introns have more base pairs ).

A gene is usually a section of DNA that contains basic information for the development of characteristics of an individual and for the production of a biologically active RNA . During this process of transcription , a complementary copy in the form of an RNA is produced from the codogenic DNA strand section .

There are different types of RNA. During translation , a partial process of protein biosynthesis , the amino acid sequence of the proteins is read from the mRNA . The proteins take on specific functions in the body that can be used to express characteristics . The state of activity of a gene or its expression, its expression , can be regulated differently in individual cells .

In general, the genes, which are only visible under an electron microscope, are also referred to as hereditary factors or genetic factors located in specific places in the chromosomes, since they are the carriers of genetic information that is passed on to offspring through reproduction . Research into the structure, function and inheritance of genes is the subject of genetics . The entire genetic information of a cell is called the genome .

Research history

In 1854 Johann Gregor Mendel began to study the inheritance of traits in peas. He was the first to suggest the existence of factors that are passed on from parents to offspring. In his crossbreeding attempts, he described that traits can be inherited independently of one another, as well as dominant and recessive traits. He developed the hypothesis that there can be homozygous and heterozygous states and thus laid the basis for differentiating between genotype and phenotype .

1900 is considered to be the year of the "rediscovery" of the Mendelian rules , as the botanists Hugo de Vries , Erich Tschermak and Carl Correns took up the fact that there are quantifiable rules according to which the factors that were responsible for the expression of characteristics are passed on to the offspring be passed on. Correns coined the term “ disposition” or “ hereditary disposition” . William Bateson recalled in Mendel's Principles of Heredity in 1902 that there are two variants of the hereditary factors in every cell. He named the second element allelomorph after the Greek word for 'other' and thus coined the term allele . Archibald Garrod , a British doctor, had studied metabolic diseases and found that they were inherited through families. Garrod realized that the laws were also valid for humans and assumed that the genetic makeup is the basis for the chemical individuality of humans. In 1902 Walter Sutton suspected that the factors Mendel called “elements” can be found on the chromosomes .

In his lectures on the theory of descent in 1904 , August Weismann presented the discovery that there is a difference between body cells and germ cells and that only the latter are capable of producing new organisms. Germ cells should contain a “hereditary substance” made up of individual elements that he called determinants . These determinants should be responsible for the visible expression of the limbs, for example.

The term "Gen" was first used in 1909 by the Dane Wilhelm Johannsen . He named the objects with which the doctrine of inheritance deals with the Greek noun γένος genos for "offspring". For him, however, they were just a unit of account. Already three years earlier, William Bateson had called the science of heredity genetics , after the Greek adjective γεννητικός gennetikos for "producing". At this point the chemical nature of the genes was still completely unclear.

In the first years of the 20th century, geneticists also looked at insects and later birds after various plants in order to test the laws of inheritance. In combination with the chromosomes discovered in 1842 and named in 1888, the chromosome theory of inheritance emerged . Through improved staining techniques , it had been observed that chromosomes first double and then divide with the cells. Therefore, they were considered as carriers of the genetic make-up. During this time there was a controversy between the proponents of Johannsen and Mendel's hypothesis that genes are material and their critics, who dismissed a connection between genes and chromosomes as “physicalism” and “Mendelism” and continued to regard genes as abstract entities.

Thomas Hunt Morgan was also convinced that it could not be physical units that were responsible for the various characteristics, and tried to refute Mendelism. He began in 1910 with cross-breeding attempts on black-bellied fruit flies . However, his work produced the opposite: the definitive proof that genes are on chromosomes and are therefore of material origin. Together with his colleagues, including Calvin Bridges , Alfred Sturtevant and Hermann Muller , he found many natural mutations and examined in innumerable crosses the probability that two traits are inherited together. They were able to show that genes are located at certain points on the chromosomes and are lined up one behind the other. Working together for years, the group created the first gene map . Since the crossing over could also be observed under the microscope , it was known that chromosomes can exchange sections. The closer two genes are to each other on the chromosome, the greater the likelihood that they will be inherited together and not separated by a crossing-over event. This made it possible to provide information about the removal of two genes which, according to Morgan , are given in centiMorgan .

Some time later, Hermann Muller began to experiment with X-rays and was able to show that irradiating flies greatly increases their mutation rate. This finding from 1927 was a sensation, as it was actually shown for the first time that genes are physical objects that can be influenced from outside.

In 1928, Frederick Griffith demonstrated for the first time in an experiment known as " Griffith's Experiment " that genes can be transferred from organisms to others. The process he demonstrated was transformation . In 1941, George Wells Beadle and Edward Lawrie Tatum showed that mutations in genes are responsible for defects in metabolic pathways, showing that specific genes encode specific proteins. These findings led to the “ one-gene-one-enzyme hypothesis ”, which was later refined to become the “ one-gene-one polypeptide hypothesis ”. Oswald Avery , Colin MacLeod and Maclyn McCarty showed in 1944 that DNA contains genetic information. In 1953 the structure of DNA was deciphered by James D. Watson and Francis Crick , based on the work of Rosalind Franklin and Erwin Chargaff , and the model of the DNA double helix was designed. In 1969, Jonathan Beckwith was the first to isolate a single gene.

The definition of what exactly a gene is has changed constantly and has been adapted to new findings. To attempt a current definition, it took 25 scientists from the Sequence Ontology Consortium at the University of Berkeley two days in early 2006 to achieve a version that everyone could live with. A gene is therefore “ a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and / or other functional sequence regions ” (German: “a localizable region of genomic DNA sequence that has a Corresponds to hereditary unit and is associated with regulatory, transcribed and / or functional sequence regions ”).

And this definition is not final either. Through the ENCODE ( ENCyclopedia Of DNA Elements ) project, in which the transcription activity of the genome was mapped, new complex regulatory patterns were found. It was found that the transcription of non-coding RNA is much more widespread than previously assumed. The definition is therefore: “ A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products ” (German: “A gene is a union of genomic sequences that encode a coherent set of possibly overlapping functional products”).


At the molecular level, a gene consists of two different areas:

  1. A segment of DNA from which a single-stranded RNA copy is produced by transcription .
  2. All additional DNA segments that are involved in the regulation of this copying process.

There are different peculiarities in the structure of genes in different living beings. The drawing shows the structure of a typical eukaryotic gene that encodes a protein.


Schematic structure of a eukaryotic gene

Regulatory elements such as enhancers or promoters are located in front of the transcription unit or within the exons ( light blue and dark blue ) and introns ( pink and red ) . Depending on the sequence, various proteins, such as transcription factors and RNA polymerase, bind to this . The pre-mRNA (immature mRNA), which initially arises in the cell nucleus during transcription, is modified to mature mRNA in the maturation process. In addition to the directly protein-coding open reading frame , the mRNA also contains untranslated, i.e. non-coding, areas, the 5 'untranslated area (5' UTR) and the 3 'untranslated area (3' UTR). These areas serve to regulate the initiation of translation and to regulate the activity of the ribonucleases , which break down the RNA again.

The structure of prokaryotic genes differs from eukaryotic genes in that they have no introns. In addition, several different RNA-forming gene segments can be connected very closely one behind the other (one then speaks of polycistronic genes) and their activity can be regulated by a common regulatory element. These gene clusters are transcribed together, but translated into different proteins. This unit of regulatory element and polycistronic genes is called an operon . Operons are typical of prokaryotes.

Genes encode not only the mRNA , from which the proteins are translated , but also the rRNA and the tRNA as well as other ribonucleic acids that have other tasks in the cell, for example in protein biosynthesis or gene regulation . A gene that codes for a protein contains a description of the amino acid sequence of that protein. This description is in a chemical language, namely in the genetic code in the form of the nucleotide sequence of DNA. The individual "chain links" (nucleotides) of the DNA - summarized in groups of three ( triplets , codon) - represent the "letters" of the genetic code. The coding area, i.e. all nucleotides that are directly involved in the description of the amino acid sequence, is called called open reading frame . A nucleotide consists of one part phosphate , one part deoxyribose (sugar) and a base. A base is either adenine , thymine , guanine or cytosine .

Genes can mutate , i.e. change spontaneously or as a result of external influences (e.g. due to radioactivity ). These changes can occur at different points in the gene. As a result, after a series of mutations, a gene can exist in different states called alleles . A DNA sequence can also contain several overlapping genes. Genes duplicated by gene duplication can be identical in sequence, but still regulated differently and thus lead to different amino acid sequences without being alleles.

Ratio of introns to exons

In general, the ratio between introns and exons varies greatly from gene to gene. There are some genes without introns, while others consist of more than 95% introns. In the case of the dystrophin gene - the largest human gene at 2.5 million base pairs - the protein encoded from it consists of 3685 amino acids. The proportion of coding base pairs is thus 0.44%.

The following table lists some proteins and the respective coding gene.

protein Number of
amino acids
gene Number of
base pairs
Number of coding
base pairs
Portion of coding
Dystrophin 3685 DMD 2,500,000 11,055 0.44%
FOXP2 715 FOXP2 603,000 2145 0.36%
Neurofibromin 2838 NF1 280,000 8514 3.0%
BRCA2 3418 BRCA2 84,000 10,254 12.2%
BRCA1 1863 BRCA1 81,000 5589 6.9%
Survivin 142 BIRC5 15,000 426 2.9%

Gene Activity and Regulation

Genes are "active" when their information is transcribed into RNA, that is, when transcription takes place. Depending on the function of the gene, mRNA , tRNA or rRNA are produced . As a result, a protein can be translated from this activity in the case of mRNA, but does not have to be. The articles Gene Expression and Protein Biosynthesis offer an overview of the processes .

The activity of individual genes is regulated and controlled by a large number of mechanisms. One way is to control the rate of their transcription into hnRNA . Another way is to break down the mRNA before it is translated, for example, via siRNA . In the short term, gene regulation occurs through binding and detachment of proteins, so-called transcription factors , to specific areas of the DNA, the so-called “regulatory elements”. In the long term, this is achieved through methylation or the “packaging” of DNA segments in histone complexes. The regulatory elements of DNA are also subject to variation. The influence of changes in gene regulation including the control of alternative splicing should be comparable to the influence of mutations in protein-coding sequences. With classical genetic methods - by analyzing inheritance patterns and phenotypes - these effects cannot normally be separated from one another in inheritance. Only molecular biology can provide information here. An overview of the regulatory processes of genes is presented in the article gene regulation.

Organization of genes

In all living beings, only part of the DNA codes for defined RNAs. The rest of the DNA is called non-coding DNA . It has functions in gene regulation , for example for the regulation of alternative splicing , and has an influence on the architecture of the chromosomes.

The location on a chromosome where the gene is located is called the gene location . In addition, genes are not evenly distributed on the chromosomes, but sometimes occur in so-called clusters. Gene clusters can consist of genes that happen to be in close proximity to one another, or they can be groups of genes that code for proteins that are functionally related. Genes whose proteins have a similar function can also be on different chromosomes.

There are sections of DNA that code for several different proteins. This is because of overlapping open reading frames .

Genetic Variation and Genetic Variability

As genetic variation , the occurrence of genetic variants (alleles, genes or genotypes ) referred to in individual organisms. It arises through mutations , but also through processes during meiosis (" crossing over "), through which the grandparents' genes are distributed differently to the sex cells. Mutations or de novo development can also be the cause of the creation of new genes .

Genetic variability , on the other hand, is the ability of an entire population to produce individuals with different genetic makeup. Not only genetic processes play a role here , but also mechanisms of partner choice . Genetic variability plays a crucial role in the ability of a population to survive under changed environmental conditions and is an important factor in evolution .

Special genes

RNA genes in viruses

Although genes are available as DNA fragments in all cell-based life forms, there are some viruses whose genetic information is in the form of RNA. RNA viruses infect a cell, which then immediately starts producing proteins following the instructions for the RNA; transcription from DNA to RNA is not necessary. Retroviruses, on the other hand, translate their RNA into DNA when they are infected, with the help of the enzyme reverse transcriptase .


A gene in the narrower sense is usually a nucleotide sequence that contains the information for a protein that is immediately functional. In contrast, pseudogenes are gene copies that do not encode a full-length functional protein. These are often the result of gene duplications and / or mutations that subsequently accumulate in the pseudogene without selection and have lost their original function. However, some seem to play a role in regulating gene activity. The human genome contains around 20,000 pseudogenes. The human genome project was founded with the aim of fully deciphering the human genome.

Jumping genes

They are also known as transposons and are mobile sections of genetic material that can move freely within the DNA of a cell. They cut themselves out of their ancestral location in the genome and insert themselves again at any other point. Biologists led by Fred Gage from the Salk Institute for Biological Studies in La Jolla (USA) have shown that these jumping genes not only occur in the cells of the germ line, as previously assumed, but are also active in nerve progenitor cells. Research by Eric Lander et al. (2007) show that transposons have an important function in that, as a creative factor in the genome, they can quickly spread important genetic innovations in the genome.


Orphan genes (also called orphans, especially in the microbial literature) are genes with no detectable homologues in other lines. Orphan genes are a subset of taxonomically restricted genes that are unique at a certain taxonomic level (e.g. plant-specific). They are usually considered unique to a very narrow taxon, even a species. Orphan genes differ in that they are line-specific and have no known history of common duplication and rearrangement outside of their specific species or group. For example, in humans there are 634 genes that the chimpanzee lacks. Likewise, humans lack 780 chimpanzee genes.

Typical genome sizes and number of genes

   Organism / biological system       Number of genes       Base pairs total   
Common water flea 30,907 2 · 10 8
Field cress ( Arabidopsis thaliana , model plant) > 25,000 10 8 -10 11
human ~ 22,500 3 · 10 9
Drosophila melanogaster (fly) 12,000 1.6 · 10 8
Baker's yeast ( Saccharomyces cerevisiae ) 6,000 1.3 · 10 7
bacterium 180-7,000 10 5 −10 7
Escherichia coli ~ 5,000 4.65 · 10 6
Carsonella Ruddii 182 160,000
Dna virus 10-300 5,000-200,000
Rna virus 1-25 1,000-23,000
Viroid 0 246-401


Web links

Wiktionary: Gen  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. ^ Wilhelm Ludwig Johannsen: Elements of the exact theory of heredity with basic features of the biological statistics of variation . 1913 ( online version ).
  2. Helen Pearson: What is a gene? In: Nature . Volume 441, May 2006, pp. 398-401, PMID 16724031 .
  3. M. Gerstein, C. Bruce, J. Rozowsky, D. Zheng, J. Du, J. Korbel, O. Emanuelsson, Z. Zhang, S. Weissman, M. Snyder: What is a gene, post-ENCODE? History and updated definition. In: Genome Research . Volume 17, No. 6, June 2007, pp. 669-681, PMID 17567988 .
  4. EH McConekey: How the Human Genome works . Jones & Bartlett, 2004, ISBN 0-7637-2384-3 , pp. 5 (English).
  5. a b N. Shiga et al. a .: Disruption of the Splicing Enhancer Sequence within Exon 27 of the Dystrophin Gene by a Nonsense Mutation Induces Partial Skipping of the Exon and Is Responsible for Becker Muscular Dystrophy . In: J. Clin. Invest. tape 100 , 1997, pp. 2204-2210 , PMID 9410897 (English).
  6. a b M. Matsuo: Duchenne muscular dystrophy . In: Southeast Asian J Trop Med Public Health . tape 26 , 1995, pp. 166-171 , PMID 8629099 (English).
  7. ^ AF Wright, N. Hastie: Genes and Common Diseases. Genetics in Modern Medicine . Cambridge University Press, 2007, ISBN 0-521-83339-6 (English).
  8. I. Bottillo u. a .: Functional analysis of splicing mutations in exon 7 of NF1 gene . In: BMC Medical Genetics . tape 8 , 2007, PMID 17295913 (English, ).
  9. a b B. Górski u. a .: Usefulness of polymorphic markers in exclusion of BRCA1 / BRCA2 mutations in families with aggregation of breast / ovarian cancers . In: J. Appl. Genet. tape 44 , 2003, p. 419-423 , PMID 12923317 (English, [PDF; 43 kB ]).
  10. UniProt O15392
  11. M. Kappler: Molecular characterization of the IAP survivin in soft tissue sarcoma. Significance for prognosis and establishment of new therapy strategies . University of Halle-Wittenberg, 2005 ( [PDF; 1.4 MB ] dissertation).
  12. AR Carvunis, T. Rolland u. a .: Proto-genes and de novo gene birth. In: Nature. Volume 487, Number 7407, July 2012, pp. 370-374. doi: 10.1038 / nature11184 , PMID 22722833 , PMC 3401362 (free full text).
  13. Eric Lander et al: Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences , Nature 447, 167-177 (May 10, 2007).
  14. Jorge Ruiz-Orera, Jessica Hernandez-Rodriguez, Cristina Chiva, Eduard Sabidó, Ivanela Kondova: Origins of De Novo Genes in Human and Chimpanzee . In: PLOS Genetics . tape 11 , no. 12 , December 31, 2015, ISSN  1553-7404 , p. e1005721 , doi : 10.1371 / journal.pgen.1005721 , PMID 26720152 , PMC 4697840 (free full text) - ( [accessed September 23, 2019]).
  15. John K. Colbourne et al .: The Ecoresponsive Genome of Daphnia pulex . In: Science . Vol. 331, No. 6017 , February 4, 2011, p. 555-561 , doi : 10.1126 / science.1197761 .
  16. Mihaela Pertea and Steven L Salzberg (2010): Between a chicken and a grape: Estimating the number of human genes . Genome Biology 11: 206
This version was added to the list of articles worth reading on December 31, 2005 .