TRANSFAC

from Wikipedia, the free encyclopedia

TRANSFAC (TRANScription FACtor database) is a manually curated database of eukaryotic transcription factors , their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites (TFBS) using appropriate software.

history

The data collection on which the database is based was first published in 1988 by Edgar Wingender . It essentially comprised three tables: on transcription factor binding sites (TFBS) in genes, on transcription factors (TF) and on DNA-binding domains of the zinc finger type. From this, a locally executable database with the name TRANSFAC was first created. As part of one of the first publicly funded bioinformatics projects in Germany at what was then the Society for Biotechnological Research (GBF, now the Helmholtz Center for Infection Research, HZI) in Braunschweig, a resource available on the Internet was developed from this. In 1997, the database was transferred to a company founded for this purpose ( BIOBASE GmbH) to secure long-term financing . However, there are still older versions of the database that are freely accessible for non-commercial users.

Content and structure of the database

Basic structure of the TRANSFAC database with the primary data on transcription factors ( FACTOR ) and transcription factor binding sites ( SITE ), from which abstractions via TF classes ( CLASS ) or nucleotide distribution matrices ( MATRIX ) are created.

The focus of the database is the relationship between transcription factors (TF) and their DNA binding sites (TFBS). For each TF, as far as it is documented in the scientific literature, its structural and functional properties are described. TF are grouped into families, classes, and superclasses based on the properties of their DNA-binding domains. This results in a classification scheme for transcription factors. The binding of a TF to a specific binding site of a gene is documented together with the exact location of this binding site, its nucleotide sequence and the methodology that led to its detection. Binding sites related to a TF (or a group of closely related TFs) are aligned and combined to form nucleotide distribution matrices ( count matrices ; position-specific scoring matrices , PSSM). Many of the matrices in TRANSFAC's matrix library were created by the annotation team, others were taken from the scientific literature.

Areas of application

The TRANSFAC database may be a. used as an encyclopedia for eukaryotic transcription factors. The target sequences and regulated genes can be listed for each TF and so comprehensive data sets for binding sequences of individual TFs can be compiled, for example as test or training sequences for TFBS recognition algorithms. The TF classification enables such data sets to be analyzed in relation to the properties of the DNA binding domain. Conversely, those TFs for which TFBS are documented in these genes can be retrieved for the regulated genes. From the TF-target gene relationships documented in TRANSFAC, transcription regulatory networks were constructed and analyzed in connection with systems biology studies. By far the most widespread use of TRANSFAC is based on the computer-aided prediction of potential transcription factor binding sites. Various algorithms use the individual TF binding sites or the matrix library for this purpose.

Tools for predicting transcription factor binding sites based on TRANSFAC content are:

  • Patch - analyzes sequence similarities with the binding sites documented in TRANSFAC; is provided together with the database.
  • SiteSeer - analyzes sequence similarities with the binding sites documented in TRANSFAC.
  • Match - identifies potential TFBS using the matrix library; is provided together with the database.
  • TESS (Transcription Element Search System) - analyzes sequence similarities with binding sites from TRANSFAC as well as potential binding sites using the matrix libraries from TRANSFAC and three other sources. TESS also provides a program for the identification of cis-regulatory modules (CRMs, characteristic combinations of TFBS) using TRANSFAC matrices.
  • PROMO - Matrix-based TFBS prediction using the commercial database version
  • TFM Explorer - Identifying common potential TFBS in a set of genes
  • MotifMogul - Matrix-based sequence analysis with different algorithms
  • ConTra - Matrix-based sequence analysis in conserved promoter areas
  • PMS (Poly Matrix Search) - Matrix-based sequence analysis in conserved promoter areas
  • ModuleMaster - Identification of presumably regulating transcription factors for any genes and subsequent identification of cis-regulatory modules (CRMs).

Comparison of matrices with those of the matrix libraries from TRANSFAC and other sources:

  • T-Reg Comparator for comparing individual or groups of matrices with those of TRANSFAC or other matrix libraries.
  • MACO (Poly Matrix Search) - Matrix comparison with matrix libraries

With the help of TRANSFAC precalculated genomic annotations are provided by various servers.

Related data sources

The following sources offer related content or content that overlaps with parts of the TRANSFAC database:

  • JASPAR - Collection of transcription factor binding profiles (templates) and sequence analysis program
  • PLACE - cis-regulatory DNA elements in plants; until February 2007.
  • PlantCARE - cis-regulatory elements and transcription factors in plants (2002).
  • PRODORIC - a similar concept to TRANSFAC - but for prokaryotes
  • RegulonDB - Focus on the bacterium Escherichia coli
  • SCPD - specific data and tool collection for yeast ( Saccharomyces cerevisiae ) (1998).
  • TFe - The transcription factor encyclopedia
  • TRDD - Transcription Regulatory Regions Database, mainly about regulatory regions and TF binding sites

Individual evidence

  1. ^ E. Wingender: Compilation of transcription regulating proteins. In: Nucleic Acids Res . tape 16 , 1988, pp. 1879–1902 , PMC 338188 (free full text).
  2. ^ E. Wingender, T. Heinemeyer, D. Lincoln: Regulatory DNA sequences: predictability of their function. In: Genome Analysis - From Sequence to Function; BioTechForum - Advances in Molecular Genetics (J. Collins, AJ Driesel, eds.) . tape 4 , 1991, pp. 95-108 .
  3. E. Wingender, P. Dietze, H. Karas, R. Knüppel: TRANSFAC: a database on transcription factors and their DNA binding sites. In: Nucleic Acids Res . tape 24 , 1996, pp. 238-241 , PMC 145586 (free full text).
  4. TRANSFAC Public on the BIOBASE GmbH gene regulation portal
  5. Access to TRANSFAC Public via TESS ( Memento of the original from July 24, 2012 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. at the Computational Biology and Informatics Laboratory (CBIL) at the University of Pennsylvania (Penn) @1@ 2Template: Webachiv / IABot / www.cbil.upenn.edu
  6. ^ E. Wingender: Classification of eukaryotic transcription factors. In: Mol Biol Engl Tr (Mosk). 31, 1997, pp. 483-497. PMID 9340487 (Russian)
  7. ^ T. Heinemeyer, X. Chen, H. Karas, AE Kel, OV Kel, I. Liebich, T. Meinhardt, I. Reuter, F. Schacherer, E. Wingender: Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms. In: Nucleic Acids Res . tape 27 , 1999, p. 318-322 .
  8. P. Stegmaier, AE Kel, E. Wingender: Systematic DNA-binding domain classification of transcription factors. In: Genome Inform . tape 15 , 2004, pp. 276-286 .
  9. ^ E. Wingender: The classification of transcription factors
  10. M. Tompa et al .: Assessing computational tools for the discovery of transcription factor binding sites. In: Nat. Biotechnol. tape 23 , 2005, pp. 137-144 , PMID 15637633 .
  11. L. Narlikar, R. Gordân, U. Ohler, AJ Hartemink: Systematic DNA-binding domain classification of transcription factors. In: Bioinformatics . tape 22 , 2006, p. e384-e392 .
  12. ^ B. Goemann, E. Wingender, AP Potapov: An approach to evaluate the topological significance of motifs and other patterns in regulatory networks. In: BMC Syst Biol . tape 3 , 2009, p. 53 .
  13. S. Kozhenkov, Y. Dubinina, M. Sedova, A. Gupta, J. Ponomarenko, M. Baitaluk: Biological Networks 2.0 - to integrative view of genome biology data. In: BMC Bioinformatics . tape 11 , 2010, p. 610 .
  14. Patch on the free BIOBASE portal
  15. ^ V. Matys, OV Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, N. Voss, P. Stegmaier, B. Lewicki-Potapov, H. Saxel, AE Kel, E. Wingender: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. In: Nucleic Acids Res . tape 34 , 2006, p. D108-D110 .
  16. SiteSeer ( Memento of the original from June 25, 2011 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. from the University of Manchester @1@ 2Template: Webachiv / IABot / www.chick.manchester.ac.uk
  17. PE Boardman, SG Oliver, SJ Hubbard: SiteSeer: Visualization and analysis of transcription factor binding sites in nucleotide sequences. In: Nucleic Acids Res . tape 31 , 2003, p. 3572-3575 .
  18. Match on the free BIOBASE portal
  19. ^ AE Kel, E. Gössling, I. Reuter, E. Cheremushkin, OV Kel-Margoulis, E. Wingender: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. In: Nucleic Acids Res . tape 31 , 2006, p. 3576-3579 .
  20. TESS (Transcription Element Search System) ( Memento of the original from August 5, 2012 in the web archive archive.today ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. at the University of Pennsylvania CBIL @1@ 2Template: Webachiv / IABot / www.cbil.upenn.edu
  21. Site Search at TESS ( Memento of the original from July 24, 2012 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.cbil.upenn.edu
  22. AnGEL CRM Searches ( memento of the original from July 24, 2012 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. in the TESS system @1@ 2Template: Webachiv / IABot / www.cbil.upenn.edu
  23. PROMO on the ALGGEN server of the Universitat Politècnica de Catalunya (UPC)
  24. X. Messeguer, R. Escudero, D. Farré, O. Núñez, J. Martínez, MM Albà: PROMO: detection of known transcription regulatory elements using species-tailored searches. In: Bioinformatics . tape 18 , 2002, p. 333-334 .
  25. TFM Explorer on the Bioinformatics Software Server of the SEQUOIA Group
  26. ^ L. Tonon, H. Touzet, JS Varré: TFM-Explorer: mining cis-regulatory regions in genomes. In: Nucleic Acids Res . tape 38 , 2010, p. W286-W292 .
  27. MotifMogul from the Institute for Systems Biology in Seattle
  28. ConTra from the University of Ghent
  29. B. Hooghe, P. Hulpiau, F. van Roy, P. De Bleser: ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species. In: Nucleic Acids Res . tape 36 , 2008, p. W128-W132 .
  30. PMS . ( Memento from July 10, 2012 in the web archive archive.today ) developed at Nanjing University
  31. G. Su, B. Mao, J. Wang: A web server for transcription factor binding site prediction. In: Bioinformation. 1, 2006, pp. 156-157. PMID 17597879
  32. ModuleMaster TF and CRM searches
  33. T-Reg Comparator . ( Memento from July 18, 2012 in the web archive archive.today ) on the server of the Max Planck Institute for Molecular Genetics
  34. MACO . ( Memento from July 10, 2012 in the web archive archive.today ) developed at Nanjing University
  35. G. Su, B. Mao, J. Wang: MACO: a gapped-alignment scoring tool for comparing transcription factor binding sites. In: In Silico Biol . tape 6 , 2006, p. 307-310 .
  36. PReMOD : Human and Mouse Genome from 2004 & 2005; IRCM / McGill University, Montreal
  37. PRIMA : Human Genome from 2004; Tel-Aviv University

literature

Web links