Phantom problem

from Wikipedia, the free encyclopedia

In computer science , the phantom problem ( inconsistent read ) is an error that can occur with several parallel database accesses . If new data records with this property are inserted in a transaction that is running at the same time during a transaction that relates to several data records with a specified property, this can result in inconsistent data in the first transaction.

Examples of the phantom problem

A link should be calculated over a complete column

Specifically, this can be, for example, the formation of the mean value of a column. Transaction 1 first determines the total using the column in the table; then transaction 2 adds a new record. As a third step, transaction 1 calculates the number of records across the columns. At the end, the previously determined sum of all data from our column is divided by the number of data records. The only problem with the calculation is that the sum of the counted data records is higher because a new data record was inserted in the middle. This falsifies the mean value.

time Transaction 1 Transaction 2
1 SELECT SUM (number) FROM inventory;

Result: Total number of goods in the warehouse, e.g. B. for two different articles with one copy each the value 2.

2 INSERT INTO inventory (article, number) VALUES ('Wikipedia: Das Buch', 3);
3 SELECT COUNT (*) FROM inventory;

Result: number of different articles in the warehouse. After the INSERT running in parallel, the returned value is 3. If the result from before is divided by this value to determine the average number of copies, the correct result (5/3) is not calculated, but a value that is too low (2 / 3).

Avoidance

The easiest way to avoid this problem is to lock the entire table in a transaction susceptible to the phantom problem. However, it is sufficient to prevent modifications to the affected column in the affected data records, if at the same time it can be ensured that no new entries can be added or existing entries can be removed that would also be entered later in the transaction.

Databases know the possibility of serializing parallel transactions corresponding to the isolation level serializable of the SQL standard. If this isolation level is used, applications that access the database must be able to deal with the resulting, unsuccessful accesses (serialization errors).

Pitfalls

Many databases have the ability of repeatable reading according to the isolation level repeatable read the SQL standards. This means that when a data record is changed, the time of the change is also saved. A transaction that started before this change will not "see" the change. However, this often does not refer to newly created data records, but only guarantees that a new read process on data that has already been read within a transaction will have the same result.

See also

Web links