BLAST algorithm

Photo 1: Schematic flow of a BLAST query.

BLAST (abbreviation of English Basic Local Alignment Search Tool ) is the umbrella term for a collection of programs for the world's most popular analysis of biological sequence data. BLAST is used to compare experimentally determined DNA or protein sequences with sequences already available in a database . As a result the program delivers a series of local alignments , i. H. Comparison of pieces of the searched sequence with similar pieces from the database. In addition, BLAST indicates how significant the hits found are. The search in the database takes place either via a web interface or with the help of various stand-alone programs that can be installed locally.

The BLAST program was developed by Stephen Altschul , Warren Gish , David J. Lipman , Webb Miller and Eugene Myers at the National Institutes of Health . Samuel Karlin was also involved in developing the algorithm .

functionality

The idea of the algorithm is based on the probability that alignments with many hits have short pieces of great identity. These sections are then enlarged further while searching for better and longer alignments.

By keeping these segments short, it is possible to edit the query sequence before a search and to keep a table of all possible parts with their origin in the original sequence.

The algorithm creates a list of all neighboring words of fixed length that would generate a hit in the query sequence with a higher scoring than a parameter to be selected. The target database is then queried for words in this list and the hits found are expanded in order to find possible maximum contiguous hits in both directions.

The main application of BLAST is the search for paralogous and orthologous genes and proteins within one or more organisms.

Methods (selection)

method	description
blastp	Compares an amino acid sequence against a protein sequence database
PSI-BLAST	Position-Specific Iterative BLAST : Used to identify distant relatives of a protein. First, a list of all very similar proteins is made. A profile is created over these proteins, a kind of averaged sequence. You then use this profile to send another search query to the protein database and you get a larger group of similar sequences. With this group you can create a new profile and repeat the process as often as you like. Because related proteins are included in the search, PSI-BLAST is much more sensitive to finding more distant relationships than the common protein-protein BLAST.
blastn	Compares a nucleotide sequence against a nucleotide sequence database
blastx	Compares a nucleotide sequence ( translated in all reading frames) against a protein database One can use this possibility to find a possible translation of a known nucleotide sequence.
tblastn	Compares a protein sequence against a nucleotide database (dynamically translated in all reading frames)
tblastx	Compares the six-frame - Translation of a nucleotide sequence against the six-frame translocations of a nucleotide sequence database. tblastx cannot be used with the nucleotide database on the BLAST website because it is technically very complex!
megablast	megablast is recommended to search for sequences that are identical to your own sequence. megablast was specially created to compare particularly long sequences with existing counterparts from the database. discontiguous megablast is recommended for finding matches between sequences that are distributed, e.g. B. from different organisms, and have a low match rate.
cdart	cdart searches for sequences with an arrangement of protein domains that is as identical as possible with the aid of the CDD (= conserved domain) database (import of matches from SMART and Pfam ) and compares them with the protein and its domains sought.

Search results

The homology of the processed search sequence is defined using the score and E-value .

The score is a quantitative assessment of the similarity of the search sequence to a known sequence (the higher the sequence, the higher the identity of the sequences).

The E value indicates the expected number of hits whose score is at least as large as the observed one (the smaller the better).

The abbreviations in front of and within the search results mean (selection):

GenBank	gi-number \| gb \| accession \| locus
EMBL Data Library	gi-number \| emb \| accession \| locus
DDBJ, DNA Database of Japan	gi-number \| dbj \| accession \| locus
NCBI Reference Sequence	gi-number \| ref \| accession \| locus
SWISS-PROT	gi-number \| sp \| accession \| Surname
General database identifier	database \| identifier
Local sequence identifier	identifier

Note: The gi number is a sequence of digits that marks a database entry in the NCBI.

literature

Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, David J. Lipman: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. In: Nucleic Acids Research . Vol. 25, No. 17, 1997, pp. 3389-3402, doi: 10.1093 / nar / 25.17.3389 .
Lewis Y. Geer, Michael Domrachev, David J. Lipman, Stephen H. Bryant: CDART: Protein Homology by Domain Architecture. In: Genome Research. Vol. 12, 2002, ISSN 1088-9051 , pp. 1619-1623, PMID 12368255 , doi: 10.1101 / gr.278202 .
Ian Korf, Mark Yandell, Joseph Bedell: BLAST. (An essential Guide to the Basic Local Alignment Search Tool). O'Reilly, Beijing et al. a. 2003, ISBN 0-596-00299-8 .
Scott McGinnis, Thomas L. Madden: BLAST: at the core of a powerful and diverse set of sequence analysis tools. In: Nucleic Acids Research. Vol. 32, Supplement 2, 2004, pp. W20-W25, doi: 10.1093 / nar / gkh435 .
Clare Sansom: Database searching with DNA and protein sequences: an introduction. (PDF; 203 kB) In: Briefings in Bioinformatics. Vol. 1, No. 1, 2000, ISSN 1467-5463 , pp. 22-32, PMID 11466971 , doi: 10.1093 / bib / 1.1.22

Web links

NCBI Blast - National Center for Biotechnology Information
NCBI BLAST + - European Bioinformatics Institute
BLAT - University of California, Santa Cruz
The Mitrion-C Open Bio Project offers a Mitrion FPGA based special version of BLAST on SourceForge .

Individual evidence

↑ Stephen F. Altschul , Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman: Basic local alignment search tool. In: Journal of Molecular Biology. Vol. 215, 1990, ISSN 0022-2836 , pp. 403-410, doi: 10.1016 / S0022-2836 (05) 80360-2 .
^ Sense from Sequences: Stephen F. Altschul on Bettering BLAST. In: sciencewatch.com. 2000, archived from the original on April 23, 2008 ; Retrieved July 7, 2016 .

[1] Stephen F. Altschul , Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman: Basic local alignment search tool. In: Journal of Molecular Biology. Vol. 215, 1990, ISSN 0022-2836 , pp. 403-410, doi: 10.1016 / S0022-2836 (05) 80360-2 .

[2] Sense from Sequences: Stephen F. Altschul on Bettering BLAST. In: sciencewatch.com. 2000, archived from the original on April 23, 2008 ; Retrieved July 7, 2016 .