Snowflake scheme

Snowflake scheme: fact and dimension tables form a snowflake-shaped structure and, in contrast to the star scheme, are further refined and normalized

The snowflake scheme is a continuation of the star scheme used in OLAP and data warehousing .

In the star schema, the dimension tables are denormalized , which results in better processing speed at the expense of data integrity and storage space. In contrast, the snowflake scheme refines the individual dimension tables by classifying or normalizing them. This branching of the data model creates the shape of a snowflake , which is where the name of this design pattern comes from.

Due to this finer structure, the data is less redundant than in a star schema , but additional join operations may be necessary for queries . A snowflake scheme therefore leads to smaller and better structured amounts of data, which, however, have more complex relationships and thus possibly lead to longer loading or query times.

definition

The snowflake scheme is a continuation of the star scheme. With this the fact table remains the same as with the star schema. However, in contrast to the star schema, the dimension tables are different because they no longer contain all dimension elements, but only data about the dimension hierarchies. For this purpose, the dimensions are refined further by being classified or normalized. In any case, the dimension tables are expanded to include the attributes so that each characteristic of a dimension can be represented in its own table. In other words, with the popular snowflake scheme, the data in the dimension tables are saved in the 3rd normal form (3NF) . Normalization creates a separate table for each hierarchy level of a dimension and thus leads to smaller and better structured data volumes. This branching of the data model creates the shape of a snowflake, which is what gives this scheme its name.

properties

Dimension tables
- Primary key to identify the dimension values
- Mapping of the dimension hierarchy using foreign keys
- normalization

Fact tables (same with star schema )
- Foreign keys to the dimension tables, d. H. the lowest level of each dimension is included as a key in the fact table
- Foreign keys on the dimensions form composite primary keys for the facts

Advantages and disadvantages

The advantages and disadvantages of the snowflake scheme compared to the simpler star scheme are shown below :

advantages

Lower memory consumption: Dimension tables do not contain redundant data due to normalization .

Many-to-many relationships between aggregation levels can be resolved using relation tables

optimal support of aggregation formation

Browsing functionality: frequent queries using very large dimension tables result in time savings and speed advantages.

disadvantage

Speed disadvantage: due to additional links in the dimension tables
More complex structuring: Due to the finer structuring, the data is less redundant than in a star scheme , but the relationships are more complex. Multi-level dimension tables therefore have to be linked again using join queries and may lead to longer query times.
Larger number of tables: Due to the more complex structure, a larger number of tables is required.
Reorganization problem: Changes in the semantic model lead to extensive reorganization of the tables and consequently to a higher maintenance effort

Star scheme vs. Snowflake scheme (normalized)

	Star scheme	Snowflake scheme
target	User-friendly query (aggregate access; simple, intuitive data model)	Redundancy minimization through normalization Efficient transaction processing
Result	Simple, local and standardized data model a fact table and a few dimension tables	Complex and specific scheme Many entities and relationships in large data models

example

Example of a snowflake scheme

The adjacent example shows the linked tables which are necessary for a complete description of the product dimension in the Microsoft data warehouse sample project Adventure Works. The category and subcategory of the product in the Product dimension must therefore be included. However, this information is not directly in the main table for the product dimension, but a foreign key relationship between the product dimension and the product subcategory dimension, which in turn has a foreign key relationship to the product category table, enables the information for product categories and subcategories to be included in the dimension table of the product.

In contrast to the star schema , the number of joins used increases linearly with the number of aggregation paths in the snowflake scheme.

literature

H. Kemper, W. Mehanna & C. Unger: Business Intelligence - Basics and Practical Applications: An Introduction to IT-Based Management Support. 2nd edition, Wiesbaden: Vieweg + Teubner, 2006 ISBN 3-8348-0275-1
J. Han & M. Kamber: Data Mining. Concepts and Techniques. 2nd Edition, San Francisco (CA) et al: Morgan Kaufmann, 2006 ISBN 1-558-60901-6
B. Larson: Delivering Business Intelligence with Microsoft SQL Server. 2nd Edition, New York: Mcgraw-Hill Professional, 2009 ISBN 0-071-54944-7
Ralph Kimball, Margy Ross: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , Second Edition. Edition, Wiley, New York 2002, ISBN 0-471-20024-7 .
Lynn Langit: Foundations of SQL Server 2005 Business Intelligence , First Edition. Edition, Apress, New York 2007, ISBN 1-590-59834-2 .

Web links

Presentation on the topic of data warehouse design (PDF; 500 kB)
Snowflake scheme vs. Star scheme
Why is the Snowflake Schema a Good Data Warehouse Design? by Mark Levene & George Loizou (PDF; 202 kB)
Schema Modeling Techniques
Optimizing Large Star-Schema Queries with Snowflakes via Heuristic-Based Query Rewriting (PDF; 178 kB)