Shotgun sequencing

from Wikipedia, the free encyclopedia

Shotgun sequencing or shotgun sequencing is in molecular biology a method for sequencing long DNA strands. It was developed from 1979 to 1981. The DNA is initially fragmented at random and the resulting fragments are then sequenced (some sequencing methods also include a step in which the DNA is replicated before sequencing). Using bioinformatic methods, the underlying DNA sequence is then reconstructed from the fragments, with a minimum of errors and gaps in the sequence being sought.

properties

With current sequencing methods one can sequence DNA strands of approx. 1100 bases in one piece. The process is then terminated or the sequence information obtained contains too many errors. The human genome is approx. 3 billion bases, the genome of a fruit fly is approx. 200 million bases and the genome of the bacterium Escherichia coli is approx. 4.6 million bases long. Accordingly, genomes cannot simply be sequenced in one piece.

principle

Sequencing with the shotgun sequencing method is divided into several phases:

  • Fragmentation of the DNA and sequencing of the fragments (fragmentation phase)
  • Determination of overlaps between the fragment sequences (overlap phase)
  • Calculation of a multiple alignment of the fragments (layout phase)
  • Determination of the consensus sequence (consensus phase)

Fragmentation

The fragments are generated randomly, either with endonucleases (e.g. DNase I, EcoRI , Endo IV or ApeI ) or by applying mechanical shear forces to the DNA ( e.g. ultrasound ). Hence the name shotgun sequencing , since the distribution of the “shot” (fragmentation) in the target is also random. A sequenced fragment is also known as a read . These reads are between 100 and 2000 nucleotides long , depending on the fragmentation and DNA sequencing method .

Overlap

In order to determine the overlap between sequenced fragments, comparisons must be made. When using a modified standard DP sequence alignment algorithm, there is a comparison in , where is the maximum fragment length. Therefore, more efficient heuristic techniques are also used in practice (e.g. by using BLAST ).

layout

The information of the overlap phase is used to arrange the fragments in an overlapping manner. This process is carried out automatically with the help of bioinformatics algorithms. Depending on the coverage ( coverage ) of the input sequence with the randomly generated fragments, the fragments are on the arrangement of the fragments gaps in this alignment (layout) present. These islands of fragment alignments, separated from one another by gaps, are also referred to as contigs . Celera Assembler is such a program package.

Repetitions in the input DNA sequence ( repeats ) are problematic, as the fragments containing the pieces of a repeat can be arranged incorrectly in the layout phase. The constructed consensus sequence can be compressed. Statistical procedures (e.g. Poisson distribution ( Lander-Waterman statistics )) can identify such points and deal with them separately.

If there are gaps even with high coverage, then gaps can be closed by other methods such as primer walking .

variants

A distinction is made between whole-genome-shotgun-sequencing and clone-by-clone-sequencing . Whole-genome-shotgun-sequencing is also referred to as double-barrel-shotgun-sequencing , since the randomly generated fragments (> 2 × 800 bases) are sequenced from both ends. The two ends of a fragment are also known as mate pairs . The length and the two end sequences of each fragment are used in the later assembly phase of the fragments. From this information, a frame (will scaffold ) created on the islands of overlapping fragments ( contigs are aligned) when each of a fragment of a mate-pair located on different overlapping fragments.

In clone-by-clone sequencing, the genome is first cut into several overlapping areas using restriction enzymes . The individual areas are cloned and a physical map of the clones in the genome is made; That is, the order and the orientation of the sequences of the clones is determined by testing for genetic markers ( physical mapping ). Then each clone sequence is individually shotgun sequenced and a complete consensus sequence can be derived with the aid of the physical map.

Web links

literature

  • R. Merkl, S. Waack: Interactive bioinformatics . WILEY-VCH, 2003, ISBN 3-527-30662-5 , pp. 313-324 .
  • Dan Gusfield: Algorithms on strings, trees, and sequences . Cambridge University Press, 1999, ISBN 0-521-58519-8 , pp. 420 ff . (Shotgun Sequencing).
  • Rolf Knippers: Molecular Genetics . 8th edition. Georg Thieme Verlag, 2001, ISBN 3-13-477008-3 , p. 465-470 .
  • SB Primrose, RM Twyman: Principles of Gene Manipulation and Genomics . 7th edition. Blackwell Publishing, 2006, ISBN 1-4051-3544-1 , pp. 362-371 .

Individual evidence

  1. ^ R. Staden: A strategy of DNA sequencing employing computer programs. In: Nucleic Acids Research (1979), Volume 6, Issue 7, pp. 2601-2610, doi: 10.1093 / nar / 6.7.2601 , PMID 461197 , PMC 327874 (free full text).
  2. S. Anderson: Shotgun DNA sequencing using cloned DNase I-generated fragments. In: Nucleic Acids Research (1981), Volume 9, Issue 13, pp. 3015-3027, doi: 10.1093 / nar / 9.13.3015 , PMID 6269069 , PMC 327328 (free full text).
  3. H. Stranneheim, J. Lundeberg: Stepping stones in DNA sequencing. In: Biotechnology journal. Volume 7, number 9, September 2012, ISSN  1860-7314 , pp. 1063-1073, doi: 10.1002 / biot.201200153 , PMID 22887891 , PMC 3472021 (free full text).