Protein domain

from Wikipedia, the free encyclopedia

A protein domain is an area of ​​a protein with a stable, mostly compact folding structure , which is functionally and structurally (quasi-) independent of neighboring sections.


A protein can consist of a single domain or several. A domain mostly corresponds to a contiguous section of the amino acid sequence . Exceptions are the bi- and multipartite domains z. B. the POU domain. Not the entire protein chain is made up of domains. A domain is often made up of bundles of secondary structures such as α-helices and β-sheet structures, with connecting curves ( turn ) between the secondary structures. Small domains are often stabilized by a complex bond of metal ions or by disulfide bridges . Typical structural motifs are often found within a protein domain .

The length of domains varies between 30 and more than 400 amino acids, typically between 100 and 200 amino acids. The length of domains is presumably limited by constraints on protein folding, since the difficulty of correct folding increases with the length of the chain. The modular structure of proteins from different domains can partly be explained by this.

Due to the tertiary structure resulting from the primary and secondary structure , protein domains usually remain functional even if they are cut out of the larger protein of which they are part. The tertiary structure is composed of the successive protein domains. Protein domains are either fixedly bonded to each other, or by flexible sections (engl. Left ) with variable folding structure connected to each other in which they, as (engl. At a joint or hinge executed ), are movable against each other. Often these areas between the domains correspond to a constriction or groove in the outer contour of the protein. In many cases, these are partially fixed by further sections that extend like an arm from one domain to the next. Within the same protein, a domain can occur several times in a row or different domains can be combined with one another. Often for a specific function, e.g. B. substrate binding , multiple domains required. In some cases, domains correspond exactly in their delimitation to exons of the DNA , for example in immunoglobulins , so they can also be defined as genetic units. However, this is not always the case.

Many proteins have a modular structure from a combination of different protein domains which can only perform the specific function of the protein in their combination. As a rule , transcription factors consist of at least one DNA-binding domain and a transactivation domain, which is involved in the initiation of transcription . As a further example, cell-cell and cell-matrix interaction proteins can be cited: Here, different binding domains in z. T. variable composition a certain substrate specificity .

A protein domain can be used in over a hundred different proteins, which differ from one another in the combination of their respective functional domains. In evolutionary terms, this enables an increased speed in the creation of new proteins, since existing building blocks can be put together quickly. Two main mechanisms are at work here: non-allelic homologous recombination and transposon- mediated insertion of a DNA segment at another point in the genome .

Domains of Unknown Function (DUFs)

Many protein domains have no known function. They are called domains of unknown function (DUF). Such domains are surprisingly common. For example, around 2700 different DUFs have been identified in bacteria. There are around 1500 DUFs in eukaryotes, of which around 800 are also found in bacteria (as of 2013). Goodacre et al. (2013) also identified 238 essential DUFs (eDUFs) in bacteria, the removal of which was found to be fatal for the cells.

Protein domain databases


Pfam includes the families of protein domains. With the help of known domains, the user can infer a similar function or an evolutionary relationship by comparing the sequence in an unknown protein.


ProDom contains protein domains derived from sequences from SWISS-PROT and TrEMBL. Furthermore, the domain structure of a protein can be represented graphically.


SMART is short for Simple Modular Architecture Research Tool and is a database of families of protein domains. The user can get information about function, important amino acids, phylogenetic development and the tertiary structure.


CDD stands for Conserved Domain Database and is a database in which domains and the associated sequence alignment can be queried. The entries here are derived from Pfam, SMART and COG.


The HITS database can be used to query protein domains.


A description of the function of the protein family, literature references and cross-references are available via InterPro. Information is compiled by integrating various databases such as PROSITE, PRINTS, Pfam and ProDom.

Identification of domains


With the help of 2ZIP, predictions about leucine zipper domains can be made.


This database contains definitions of protein domains.

DALI Domain Dictionary

The DALI dictionary of domains makes an automatic classification of protein domains on the basis of sequence matches. This dictionary enables the user to compare 3-D protein structures and identify structural domains that are similar in two different proteins, even though the sequences are different from each other.

Protein domains

Individual evidence

  1. Jane S. Richardson (2007): The anatomy and Taxonomy of Protein structure. Extended web version by J. Richardson (1981): The anatomy and Taxonomy of Protein structure. Advances in Protein Chemistry 34: 167-339. On-line
  2. N. Dekker, M. Cox, R. Boelens, CP Verrijzer, PC van der Vliet, R. Kaptein: Solution structure of the POU-specific DNA-binding domain of Oct-1. In: Nature (1993), Volume 362, Edition 6423, pp. 852-855. PMID 8479524 .
  3. Jeremy M. Berg, John L. Tymoczko, Lubert Stryer : Biochemistry . 6 edition, Spektrum Akademischer Verlag, Heidelberg 2007. ISBN 978-3-8274-1800-5 . P. 63.
  4. Tom Strachan, Andrew Read: Human Molecular Genetics ; Garland Science, 4th Edition, 2011; P. 315; ISBN 978-0-8153-4149-9 .
  5. ^ A b N. F. Goodacre, DL Gerloff, P. Uetz: Protein domains of unknown function are essential in bacteria. In: mBio. Volume 5, Number 1, 2013, pp., ISSN  2150-7511 . doi : 10.1128 / mBio.00744-13 . PMID 24381303 .


  • Donald Voet, Judith G. Voet: Biochemistry. 3rd edition, John Wiley & Sons, New York 2004. ISBN 0-471-19350-X .
  • E. Buxbaum: Fundamentals of Protein Structure and Function , Springer, New York 2007. ISBN 978-0-387-26352-6 .
  • Bastien D. Gomperts, Ijsbrand M. Kramer, Peter ER Tatham: Signal transduction , Academic Press, 2009, ISBN 978-0-12-369441-6 .