Slowly Changing Dimensions

from Wikipedia, the free encyclopedia

The term Slowly Changing Dimensions is used in data warehousing to summarize methods to record changes in dimension tables and, if necessary, to document them historically. Essentially, a distinction is made between three methods, which are divided into types according to Kimball. What they all have in common is that existing data records are linked to new data records via the primary key in order to save changes in the table. Technical keys are not part of the article.

Type 0

The Type 0 is a passive method. A historization in the actual sense is dispensed with. The technical primary key (PK for Primary Key) is used to look up in the dimension table whether the PK of the new data record already exists. If this is the case, nothing is done - the first known values ​​are retained. Otherwise a new data record is added.

Type 1

The Type 1 is the most trivial method. There is no historization. The technical primary key (PK for Primary Key) is used to look up in the dimension table whether the PK of the new data record already exists. If this is the case, the corresponding record is overwritten with the new data. Otherwise a new data record is added.

Type 2

Type 2 is a complex process for historicalizing dimension tables or individual attributes of the table in order to be able to determine the valid values ​​of the table at any time. This is achieved by storing a validity interval for each data record . In order to guarantee the uniqueness of the PK, it must be extended by at least one of the interval attributes. As a rule, an interval closed at the bottom is used, in which the valid sentence is marked as infinitely valid . The basis is the comparison of the existing data records with the new data records from a complete and periodic extraction using the technical primary key without the validity attribute or attributes. There are three different cases:

  • The new record does not yet exist in the dimension.
Procedure: The data record is added.
  • The record in the dimension does not exist in the valid extraction.
Procedure: The data record is set from infinitely valid to valid until yesterday .
  • The new data record can be assigned to a data record from the dimension table via the PK.
Procedure: The attributes to be historicized are compared with one another.
  • No changes are found: The process is finished. The next data record is processed.
  • Changes are detected: The valid dimension record is set to valid until yesterday . The new data record is inserted with valid from today and infinitely valid .

Type 3

With type 3 the table is widened . This means that a new attribute is added. This procedure can be used, for example, when re-coding sales regions or product groups. This method is only advisable for very specific changes.

Examples

For a better understanding, the following example should clarify the different procedures. A dimension table product is given . The GueltVon and GueltBis attributes are only relevant for type 2 and are at least part of the PK.

PNumber (PK) PName Pgroup GueltBy GueltBis
1 Glenfarclas whiskey 01/01/2002 12/31/9999
2 Jim Beam whiskey 01/01/2002 12/31/9999
3 Krombacher beer 01/01/2002 12/31/9999

The following data set will be extracted from the operational system on August 10, 2005:

PNumber (PK) PName Pgroup
2 Jim Beam Whisk e y

after type 0 , the new data record is ignored because the primary key already exists in the target table:

PNumber (PK) PName Pgroup GueltBy GueltBis
1 Glenfarclas whiskey 01/01/2002 12/31/9999
2 Jim Beam whiskey 01/01/2002 12/31/9999
3 Krombacher beer 01/01/2002 12/31/9999

after type 1 the second data record is overwritten:

PNumber (PK) PName Pgroup GueltBy GueltBis
1 Glenfarclas whiskey 01/01/2002 12/31/9999
2 Jim Beam Whisk e y 01/01/2002 12/31/9999
3 Krombacher beer 01/01/2002 12/31/9999

after type 2 , the second data record is historicized:

PNumber (PK) PName Pgroup GueltVon (PK) GueltBis
1 Glenfarclas whiskey 01/01/2002 12/31/9999
2 Jim Beam whiskey 01/01/2002 08/09/2005
2 Jim Beam Whisk e y 08/10/2005 12/31/9999
3 Krombacher beer 01/01/2002 12/31/9999

after type 3 a new attribute newPGroup is entered:

PNumber (PK) PName Pgroup new group GueltBy GueltBis
1 Glenfarclas whiskey whiskey 01/01/2002 12/31/9999
2 Jim Beam whiskey Whisk e y 01/01/2002 12/31/9999
3 Krombacher beer beer 01/01/2002 12/31/9999

Individual evidence

  1. Ralph Kimball, Margy Ross: The Data Warehouse Toolkit. The Complete Guide to Dimensional Modeling. 2nd Edition. John Wiley & Sons, New York a. A. 2002, ISBN 0-471-20024-7

Web links