Z curve

from Wikipedia, the free encyclopedia

Four iterations of the Z curve

The Z-curve (Lebesgue curve, English Z-order curve ) is a mapping that brings points from the multidimensional space into a linear order , the Z-order or Morton-order , an order with neighborhood-preserving properties. The Z-value of a spatial point is calculated by interlacing the binary coordinate values bit by bit .

With their help, (efficient) processes based on a linear order can be transferred to the multidimensional. This includes binary search , binary search tree , skip list , B-tree , or a B + -tree . In the latter case it is named UB-Tree (Universal B-Tree) after Rudolf Bayer .

In the case of direct access, the Morton order is in competition with an access in which the linear orders of the dimensions are connected one behind the other in different rankings to form a lexicographical order - in the internal memory via a multi-indexed field or. in the external memory via random access . If this can be organized well, it does a little worse. However, it is superior if such an access is followed by a sequential search in which neighborhood relationships can be advantageously used. The Z-curve is popular because of its good neighborhood conservation and the ease with which the Z-values ​​can be calculated. Neighborhood conservation is better for the Hilbert curve , but the calculations are more complicated.

This article deals mainly with the two-dimensional case.

Number theory

Z-curve (gray) of the third iteration
x-bits blue, y-bits red

The adjacent figure shows the Z values ​​in the two-dimensional case for integer coordinates . The bit interleaving of - and values (also Bitwise Gear cutting or Binärbruchpressung called) gives the binary values. If you connect these in their ascending numerical order, a curve ( polygon course ) is created, which is called the Z-curve. The underlying mapping is in its -th iteration with

designated. It can easily be extended to higher dimensions and is reversibly unambiguous ( bijective ).

In the binary representations of the Z values for there are 1 bits at most at binary places with an even number. In the base 4 system , these numbers only consist of digits 0 and 1. These numbers are called Z values ​​in the narrower sense or Moser-de Bruijn numbers . You make up the sequence A000695 in OEIS .

For the following, let this sequence be specified as

,

where the first component is given the index 0. The sums and differences between two components can be formed using bit-wise operations

and
if ,

with and with the operators for bit-wise logical AND and for bit-wise logical OR, each applied to the operands resolved into bit strings .

A formula for generating the next element from the predecessor is

.

Applications in computer science

Leaving out less significant bits, one can use the Z-values ​​for hash tables in which neighborhood searches are possible.

Rectangular search area

In spite of the good neighborhood preservation, an algorithm is required for the multi-dimensional area search in order to determine the next Z-value, the coordinates of which lie in the search area, starting from a point found in the data structure outside the search area.

In the example in the adjacent figure, the search area (x = 2..3, y = 2..6), a 2D interval, is displayed as a dashed rectangle. The highest Z value in it is MAX = 45. Assume that the value F = 19 is encountered in the course of the search, when searching for increasing values. The 1D interval between F and MAX (hatched area) is the superset of the part of the rectangle still to be searched. To speed up the search, we calculate the next Z value in the search area, hereinafter called BIGMIN (36 in the example). Then you only have to search through the interval between BIGMIN and MAX (values ​​shown in bold), so that the majority of the hatched area is skipped. The search for falling values ​​is analogous to this, with LITMAX, the largest Z-value in the search range, which is smaller than F. The problem and its solution were first described in 1981 by Tropf and Herzog. The solution is also used in the UB tree (“GetNextZ-address”).

By using the method hierarchically (according to the data structure used), if necessary after both rising and falling Z-values, a highly efficient multi-dimensional area search is obtained; this is useful in both commercial and technical applications, e.g. B. as a basic function for (closest) neighborhood searches.

BIGMIN source code for the Z curve and the Hilbert curve can be found in the Tropf patent.

The hierarchical application of this method, adapted to the underlying data structure and supporting both ascending and descending search direction, enables a highly efficient area search in commercial and technical applications. The Morton order is one of the few multi-dimensional access methods that have found their way into commercial database systems ( Transbase ).

Analysis

Binary fraction entanglement

The mapping "defined" by infinite iteration of the above rule and normalized to the unit interval

with numbers is not initially well defined because it a reduced fraction with a power of two in the denominator, that is an element , two options are the representation. For example, the fraction is shown with a 0 2 ending

and those with a 1 2 ending

.

This option (equality) is not given for finite items, but it is in the Limes , where it destroys the legal uniqueness of required for a function , since it affects the result of the entanglement. However, legal clarity can be restored, for example by one of the following provisions:

Regulation A:  The binary representation of a fraction always has a 0 2 ending. This corresponds to the usual terminating representation and an approach to from above. This variant of is denoted by.
Regulation B:  The binary representation of 0 is 0 2 . The binary representation of a fraction always has a 1 2 ending. (If so , then the binary representation is to be used.) This corresponds to an approximation from below, i.e. a left-hand limit value formation for every function in which the binary representation is important, for example a limit . This variant of is denoted by.

Each of the two rules is well-defined and is also injective .

but is not surjective . Because with two provisions there is no archetype, as the coordinate necessarily an 0 2 -end and coordinate necessarily a 1 2 -end who would (resp. Vice versa .) The image of the unit square below is

with regulation A resp.
with regulation B.

In both cases it lacks a countable, dense subset of the unit interval. So it is not compact . (Nonetheless, the picture's closed shell .)

is also not continuous , since at the points an infinitesimal change in the argument causes a finite change in the function value. This can already be seen in the above figure "third iteration" , for example at the step of one of the -coordinate from to or at the step of one of the -coordinate from point to point , where the position number in the Z curve by more than a third (32−10 = 22> 64/3) resp. increases by more than one sixth (16−5 = 11> 64/6) of its total length (64).

Careful consideration

If the binary representation of a number does not contain a 1, then it is the 0, and there is no option to represent it with a 1 2 ending in such a way that the same numerical value 0 comes out.

To examine the continuity at a point , one can use the differences between two one-sided limit values. Because if and only if these are all 0 , the function is continuous at the point . Is not yet a breaking off binary fraction, then vote the unilateral limits of agreement and is continuous at . However, if, for example, there is a terminating binary fraction , then the left-hand limit value corresponds to a 1 2 end of , while the right-hand limit value corresponds to a 0 2 end of .

First, will a Binärbruchverschränkung for examined with not abbrechendem and abbrechendem with and .

with 0 2 ending
with 1 2 ending
Carryovers                
difference

The first line contains the entangled binary digits of , if represented with 0 2 end, the second for the same represented with 1 2 end , the third contains the carries of the binary subtraction and the fourth line contains the jump height , i.e. i. the difference "breaking off " of the first two lines in the Limes, ie

.

For with terminating at and and with not terminating is analogous

with 0 2 ending
with 1 2 ending
Carryovers            
difference

The "terminating " difference is in the Limes

.

Is , then the jump heights add up.

Space-filling

The "reverse function"

with is - like the above for the same reason of the lack of legal uniqueness due to ambiguous binary representability - initially not well-defined . As there, the right uniqueness can be established by the rule that a fraction is to be expanded either only with the 0 2 end (rule A) or only with the 1 2 end (rule B). Depending on whether the variant is denoted by with or with . Each of the two regulations is well-defined. It maps the unit interval surjectively ( "space-filling" ) onto the unit square .

But it is not injective because the points from have several archetypes. For example, the point has both

because of as well
because of

to the original image. The reverse of this corresponds exactly to the two one-sided limit values

and
.

Something different is the case in the previous section Binärbruchverschränkung mentioned points and , neither of pixels nor are. Is under

and
,

whereas the one-sided limit values ​​of at the point

and

are.

is not continuous at one point because the left-hand limit value ( at the 1 2 end ) and the right- hand limit value ( at the 0 2 end ) differ in the result (see table with examples).

The sum of all "jump sizes" of the discontinuities (see table) grows exponentially with the number of iterations, because per iteration (see figure "Four iterations" ) the size of the jumps decreases by a factor of 2, but the number of jumps by the factor 4 increases.

Table with examples

Explanation of the columns in the table

  1. The exponent correlates with the iteration number ( ) of the figure "Four iterations" .
  2. To the z-values , there are (at least) a discontinuity of - or coordinate . The column contains the smallest associated z-value, the others are odd multiples of this and are only counted (in the Number column ). For the purposes of the next column , the x bits are colored blue and the y bits are colored red.
  3. is also the right-hand limit value .
  4. The next column shows the same z-value but with the 1 2 end . Here, too, the x bits are colored blue and the y bits are colored red.
  5. According to this coloring, the bits in the column are -end in - and coordinate divided.
  6. brings the left-hand limit value (one of its coordinates is , the other is greater).
  7. Direction indicates which coordinate is changing. For example, ↑ means that the coordinate changes at this point with an infinitesimal increase , namely decreasing and in the figures "four iterations" and "third iteration" upwards.
  8. Width contains the Euclidean distance between the two limit values.
  9. Number contains the number of discontinuities (in the unit square) with this direction and width.
  10. The contribution to the total sum of the jump distances is the product of the distance times the number.

 

-The End
 
-The End RICH
tung
WEI
te
arrival
number
accession
contract
1 1   1
2 2 +1
3 4th +2
4th 8th +2
5 16 +4
Tab. 1: The jump widths depending on the exponent

Other values:

Other limits compared for

Because of the discontinuity of , the image set of is not a curve . But since it lies in the plane, so it is two-dimensional, and with increasing argument always a discontinuity only in the - or the coordinate jumps, their two edges (left-sided and right-sided limit) of discontinuity by a parallel to the axis of the jumping Connect coordinates as the figure “Four iterations” suggests (and as indicated in the figure “Third iteration (example)” ). This creates a continuous mapping , the image of which is called the Z-curve . The continuity implies that left-hand and right-hand limits are the same; with the result that legal clarity exists even without a decision in favor of rule A or rule B. is continuous and surjective, but not injective. Since the finite iterations are nevertheless injective and thus self-evasive, the Z curve belongs to the FASS curves , which in turn are a real subset of the space-filling curves.

Explicit wording
Third iteration Z-curve.
In three exemplary jumps , connecting stretches that make it steady are added, which cross the parallel axis in the Limes.

The following gives an explicit formula for the continuous Z function.

In order to accommodate the connecting lines in the definition set (and image set), the argument is written ternary, i.e. base 3 and with the digits 0, 1, 2. The binary fraction entanglement only takes place with digits 0 and 1. The first occurrence of the number 2 is given the meaning that the following numbers, interpreted as a ternary number, parameterize the connecting route between the adjacent left-hand and right-hand limit values ​​in a straight line.

The line connecting the point of discontinuity in the first iteration runs from point to point and has a length of 1. It can be found in the table below in the three rows with (see also figure “Third iteration (example)” ). To fill the area of ​​the unit square, one can simply start from the interval , because there is no need to allow a 2 for the first ternary digit .

So be and with and a ternary representation . Further be

the smallest index with a digit 2 (consequently ) and

    (only required if ).

Besides, be

(digits only with an even index ) and
(digits only with odd index )

and finally

and
    (only required if ).

Then the function

 

the steadily added iteration of the curve number in the figures "Four iterations" and "third iteration" , the Z curve in the narrower sense. This is surjective, but not injective.

Some examples:


 
RICH
tung

 
2
2
2
3
3
3
4th
4th
4th
Tab. 2: Image points of , including 3 connecting lines with the exponents

See also

Web links

Remarks

  1. Strictly speaking, this mapping is not, but at most its reversal is a space-filling curve .
  2. In the literature, it is a convention to put the bits to the left of the bits. This creates the characteristic Z-shape when (as in the first two figures) grows to the right and down.
  3. The injectivity is lost with (more precisely: im ). The problems with the right uniqueness that also occur have the same cause ( see below ).
  4. The points of the Cantor set contain only digits 0 and 2 in their classical representation for base 3 .
  5. In order to avoid ambiguity or confusion with the comma in the notations for intervals or coordinate pairs, the point is used below as a separator for the positions with negative exponents. We follow M. Bader in this regard as well as in the placement of the base as a prefix for this point.
  6. a b This reason for discontinuity applies completely regardless of the decision whether rule A or rule B.
  7. Continuous and bijective is only possible with the same dimension ( theorem of the invariance of the dimension ).
  8. There are room-filling curves that are constant from the start in the Limes, such as the Hilbert curve and the Peano curve . In this respect, the continuity of the Z curve, which has been improved here, indicates a neighborhood behavior that can be improved.

Individual evidence

  1. Volker Gaede, Oliver Guenther: Multidimensional access methods . (PDF) In: ACM Computing Surveys . 30, No. 2, 1998, pp. 170-231. doi : 10.1145 / 280277.280279 .
  2. ^ GM Morton: A Computer Oriented Geodetic Data Base; and a New Technique in File Sequencing . In: IBM (Ed.): Technical Report . , Ottawa, Canada 1966.
  3. a b Frank Ramsak, Volker Markl, Robert Fenk, Martin circles, Klaus Elhardt, Rudolf Bayer: Integrating the UB tree into a Database System Kernel Archived from the original on March 4, 2016. In: Int. Conf. on Very Large Databases (VLDB) . 2000, pp. 263-272.
  4. a b Jeyarajan Thiyagalingam, Olav Beckmann, Paul HJ Kelly: Is Morton layout competitive for large two-dimensional arrays yet? . In: Concurrency and Computation: Practice and Experience . 18, No. 11, September 2006, pp. 1509-1539. doi : 10.1002 / cpe.v18: 11 .
  5. ^ H. Tropf, H. Herzog: Multidimensional Range Search in Dynamically Balanced Trees . (PDF) In: Angewandte Informatik , 2/1981, pp. 71–77.
  6. Patent US7321890 : Database and method for organizing data elements. Registered on February 18, 2004 , published on January 22, 2008 , inventor: Hermann Tropf (expired).
  7. Is Morton layout competitive for large two-dimensional arrays yet? ( Memento from October 15, 2006 in the Internet Archive )
  8. a b for is
    such as
    .
  9. Michael Bader: Space-Filling Curves. An Introduction to Applications in Scientific Computing (= Timothy J. Barth, Michael Griebel, David E. Keyes, Risto M. Nieminen, Dirk Roose, Tamar Schlick [Eds.]: Texts in Computational Science and Engineering . Volume 9 ). 1st edition. Springer-Verlag, 2013, ISBN 978-3-642-31045-4 , ISSN  1611-0994 , doi : 10.1007 / 978-3-642-31046-1 (English, 278 pages). P. 96.