Physical mapping

Physical mapping describes the method by which the sequence of a chromosome can be obtained by correctly arranging clones from a clone-by-clone-shotgun process. Thanks to new bioinformatics methods , the speed of gene mapping has multiplied in recent years.

Problem

The goal of physical mapping is to create a complete sequence of a genome. When chromosomes are sequenced, certain problems arise which mean that only sections with a size of approx. 800 bp can be sequenced in one go. One possibility to solve this problem is to use additional information to put the sequence snippets in a correct order and arrangement, and thus to obtain a complete chromosome from their entirety.

Biological background

In order to be able to determine the order of the clones at all (clones in this context denote small DNA sequences; clones because they are partial copies of the chromosome), the clones must overlap and probes must be able to determine whether and which clones overlap. In order to get enough clones, the chromosome is multiplied and cut with restriction enzymes, which, depending on the method, leads to a partial digest problem or a double digest problem . Before the clones are sequenced, clone-probe hybridization is used to determine which clones hybridize with which probes . Since the probes are chosen so that they (ideally) only occur once in the chromosome, a clone-probe hybridization matrix can be created with the help of which the overlap and arrangement of the clones can be determined. Knowing how the clones overlap may not all need to be sequenced.

The flawless case

Assuming all clones

have the same length
all overlaps are unique (each probe binds only in one place) and
all hybridizations attest to real overlaps

which means that no hybridization binds in the wrong place, so that it looks like an overlap where there is none, the order of the clones can be clearly established with a PQ tree, or by solving the consecutive ones problem .

The clone-probe hybridization matrix is a two-dimensional matrix, the rows of which stand for one clone per row and each column for one probe per column. The fields of the matrix are labeled with 1 if the respective probe hybridizes with the respective clone, otherwise with 0.

clone	1	2	3	4th	5	6th
1	1	0	0	1	1	0
2	0	1	1	0	1	0
3	1	1	0	0	1	0
4th	0	1	1	0	0	1

We are looking for an order of the columns of this matrix M so that there is exactly one connected block of ones in each row. Such a block is called consecutive. Such a block then shows which probes are next to each other on a clone. Clones which partially hybridize with the same probes overlap, their non-overlapping parts are to the left and right of the overlaps.

To solve this problem, the matrix is converted into a data structure called a PQ tree.

literature

Dan Gusfield: Algorithms on strings, trees, and sequences. Cambridge University Press, 1999, ISBN 0-521-58519-8 , p. 395ff ( Maps, mapping, sequencing, and superstrings ).