Bucket algorithm

The bucket algorithm or bucket algorithm is as part of the information integration an algorithm for the efficient collection of views of local data sources ( Local-as-View ) which must be combined for an inquiry to the integrated overall system. The algorithm was introduced in 1996 by Levy, Rajaraman and Ordille.

In principle, the number of combinations to be tested grows exponentially and the problem of testing NP-complete . However, the number of combinations can be reduced through a clever preselection. The basic idea of the bucket algorithm is that each relation of the request receives a bucket. All views that can be used for the relation are added to each bucket. Only all combinations of request rewritings that contain exactly one view from each bucket are checked.

Whether a view can be used for a relation depends on its sub-goals. All query attributes must be present and the predicates must be suitable. A view can appear several times in a bucket if it fits to fulfill several sub-goals.

It can be shown that the bucket algorithm always determines a description of the request that delivers a complete result. An extension of the algorithm determines only the best inquiry plans, so that not all partial plans have to be executed.

Alternatives to the bucket algorithm are the inverse rules algorithm or the MiniCon algorithm .

example

A global scheme is given with the following relations:

Adresse: Ausweisnummer, Ort
Person: Ausweisnummer, Name, Alter

as well as three local data sources with the following schemes:

Q1: Ausweisnummer, Name, Ort
Q2: Name, Ausweisnummer, Alter
Q3: Ausweisnummer, Alter, Beruf

as well as three sources whose local schemas are formulated as views of the global schema (see example under Local-as-View ):

CREATE VIEW S1 AS SELECT Person.Ausweisnummer, Person.Name, Adresse.Ort FROM Person, Adresse WHERE Person.Ausweisnummer = Adresse.Ausweisnummer

CREATE VIEW S2 AS SELECT Ausweisnummer, Name, Alter FROM Person

CREATE VIEW S3 AS SELECT Ausweisnummer, NULL, Alter FROM Person

Now the following request should be reformulated to the global schema:

SELECT Person.Alter, Adresse.Ort FROM Person, Adresse WHERE Person.Ausweisnummer=Adresse.Ausweisnummer

The request can be formulated more clearly in Prolog (programming language)

Q(Alter,Ort) :- Person(Ausweisnummer,_,Alter), Adresse(Ausweisnummer, Ort).

Now two buckets are created for the two relations occurring in the query and filled with the views that can be used for this relation.

Bucket (person): S1, S2, S3
Bucket (address): S1

The following combinations result

S1 and S1
S1 and S2
S1 and S3

The views S1, S2 and S3 in Prolog notation:

S1(Ausweisnummer, Name, Ort) :- Person(Ausweisnummer, Name, _), Adresse(Ausweisnummer, Ort)

S2(Ausweisnummer, Name, Alter) :- Person(Ausweisnummer, Name, Alter)

S3(Ausweisnummer, Alter) :- Person(Ausweisnummer, _, Alter)

The combination of S1 with S1 cannot answer the query because the header of S1 does not contain the query attribute 'age'. In order to answer the query, the other two combinations remain, which are combined. As a description of the request, it results:

SELECT S2.Alter, S1.Ort FROM S1, S2 
WHERE S1.Ausweisnummer=S2.Ausweisnummer
UNION
SELECT S3.Alter, S1.Ort FROM S1, S3 
WHERE S1.Ausweisnummer=S3.Ausweisnummer

literature

Alon Y. Halevy: Answering queries using views: a survey. In: The VLDB Journal. Vol. 10, No. 4, December 2001, ISSN 1066-8888 , pp. 270-294, doi : 10.1007 / s007780100054 .
Alon Y. Levy, Anand Rajaraman, Joann J. Ordille: Querying heterogeneous information sources using source descriptions. (PDF; 1.3 MB). In: TM Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): Proceedings of the 22nd International Conference on Very Large Data Bases. Mumbai (Bombay), India, September 3-6, 1996. Morgan Kaufmann, San Francisco CA 1996, ISBN 1-55860-382-4 , pp. 251-262.