Clustal

from Wikipedia, the free encyclopedia
Clustal Omega
Basic data

developer Des Higgins, Fabian Sievers (Conway Institute, UCD )
Current  version 1.2.1
(February 28, 2014)
operating system Unix , Linux , Mac , MS-Windows
programming language C ++
category Bioinformatics tool
License GNU General Public License , version 2
www.clustal.org/omega/
Clustal
Basic data

developer Gibson T. ( EMBL ), Thompson J. ( CNRS ), Higgins D. ( UCD )
Current  version 2.1
(November 17, 2010)
operating system Unix , Linux , macOS , Windows
programming language C ++
category Bioinformatics tool
License from version 2.1 LGPL , previously free for academic users
www.clustal.org

Clustal is a widely used computer program for multiple sequence alignment . The current version is 2.1. There are three variants of the program:

Input / output

The program can process a wide range of input formats, including NBRF / PIR, FASTA , EMBL / Swissprot or UniProt, Clustal, GCC / MSF, GCG9 RSF and GDE.

The output can be in the following formats: Clustal, NBRF / PIR, GCG / MSF, PHYLIP , GDE, NEXUS.

Multiple sequence alignment

Clustal performs three main steps:

  1. Pairwise alignment ,
  2. create a phylogenetic tree (or use a custom one),
  3. use the phylogenetic tree for multiple alignment.

These steps are performed automatically when you select Do Complete Alignment . Further options are Do Alignment from guide tree (carry out alignment using a guide tree ) and Produce guide tree only (only create the guide tree ).

Profile alignments

Pairwise alignments are calculated for all and against all sequences; Matches are stored in a matrix. This is then converted into a distance matrix , where the distance value reflects the evolutionary distance between each sequence pair.

From this distance matrix is based on a neighbor joining clustering ( Neighbor-joining clustering algorithm ), a guide tree , or a phylogenetic tree constructed which specifies the order in the sequence couples aligniert (arranged), and are to be combined with previous alignments. Sequences are progressively aligned at each branch point, starting with the sequence pair that is closest to each other.

Settings

Users can align sequences using the default setting, but it makes sense to use your own parameters on a case-by-case basis.

The main parameters are the gap opening penalty and the gap extension penalty (see sequence alignment ).

Accelerated version

An FPGA -based version of the ClustalW algorithm is offered by Progeniq and has a processing speed that is twenty times higher than that of the software implementation.

swell

  • JD Thompson et al. (1997): The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. In: Nucleic Acids Research . Vol. 25, pp. 4876-4882. PMID 9396791
  • R. Chenna et al. (2003): Multiple sequence alignment with the cluster series of programs. In: Nucleic Acid Research. Vol. 31, pp. 3497-3500. PMID 12824352
  • MA Larkin et al. (2007): Clustal W and Clustal X version 2.0. In: Bioinformatics. Vol. 23, pp. 2947-2948. PMID 17846036
  • F. Sievers et al. (2011): Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. In: Mol Syst Biol 7. 2011 Oct 11. doi : 10.1038 / msb.2011.75

Web links

Individual evidence

  1. See file COPYING, in source archive , accessed January 15, 2014