Data Vault

from Wikipedia, the free encyclopedia

Data Vault is a modeling technique for data warehouses that is particularly suitable for agile data warehouses. It offers a high degree of flexibility for extensions, a complete unit-temporal historization of the data and allows a strong parallelization of the data loading processes.

history

The data vault modeling was developed in the 1990s by Dan Linstedt , who was working for the National Security Agency at the time. Following initial publications in 2000, Data Vault began to gain attention from 2002 through a series of articles in The Data Administration Newsletter . In 2007, Linstedt won the support of Bill Inmon , who described Data Vault as the "optimal choice" for its DW 2.0 architecture.

In 2009, 2011 and 2015 Linstedt published books about Data Vault, partly together with other authors. Since 2013 he has been promoting a package of modeling, architecture and methodology approaches under the name Data Vault 2.0. Linstedt's former business partner Hans Hultgren also published a book on data vault modeling in 2012, followed in 2019 by a book by the Australian John Giles on creating data vault models using patterns .

Data Vault achieved particular popularity in the Netherlands .

Modeling

Data Vault combines aspects of relational database modeling with the third normal form (3NF) and the star schema . It belongs to a family of modeling techniques called hypernormalized or ensemble modeling by various authors.

Simple data vault model with two hubs (blue), one link (green) and four satellites (yellow)

In data vault modeling, all information belonging to a business concept (such as customer or product) is divided into three categories and stored accordingly in three different types of database tables. Hultgren calls this procedure "unified decomposition" because the information is stored in different tables, but is still linked by a common key .

The first category “Hub” includes information that clearly describes a business concept, i. H. give him his identity (e.g. customer number with the customer). A hub is therefore a list of unique business keys and serves as an integration point for data from various sources.

The second category “Link” includes all types of relationships between business concepts (e.g. assignment of a customer to an industry). These can be hierarchical relationships (e.g. employees are subordinate to managers), business processes (e.g. doctor treats patient in hospital) or identity relationships (two customer numbers denote the same customer).

All attributes that describe a business concept or a relationship (e.g. name, date of birth or gender of a customer) belong in the third category, "Satellite". The unit-temporal historization also takes place in the satellites. A hub or link can have several satellites, which are divided according to data source or change frequency, for example.

This type of modeling allows flexible changes so that, as a rule, no existing tables have to be adapted, but rather new tables (e.g. new attributes in an additional satellite) are simply added. Due to the strong schematization of the data loading processes, ETL process templates can be used, so that in the best case scenario, only one adjustment of the configuration is necessary to change or expand the data loading process.

literature

  • John Giles: The Elephant in the Fridge. Guided Steps to Data Vault Success through Building Business-Centered Models . Technics, Basking Ridge 2019, ISBN 978-1-63462-489-3 .
  • Kent Graziano: Better Data Modeling. An Introduction to Agile Data Engineering Using Data Vault 2.0 . Data Warrior, Houston 2015.
  • Hans Hultgren: Modeling the Agile Data Warehouse with Data Vault . Brighton Hamilton, Denver et al. 2012, ISBN 978-0-615-72308-2 .
  • Dirk Lerner: Data Vault for agile data warehouse architectures . In: Stephan Trahasch, Michael Zimmer (Ed.): Agile Business Intelligence. Theory and practice . dpunkt.verlag, Heidelberg 2016, ISBN 978-3-86490-312-0 , p. 83-98 .
  • Daniel Linstedt: Super Charge Your Data Warehouse. Invaluable Data Modeling Rules to Implement Your Data Vault . Linstedt, Saint Albans, Vermont 2011, ISBN 978-1-4637-7868-2 .
  • Daniel Linstedt, Michael Olschimke: Building a Scalable Data Warehouse with Data Vault 2.0 . Morgan Kaufmann, Waltham, Massachusetts 2016, ISBN 978-0-12-802510-9 .
  • Dani Schnider, Claus Jordan and others: Data Warehouse Blueprints. Business intelligence in practice . Hanser, Munich 2016, ISBN 978-3-446-45075-2 , pp. 35-37, 161-173 .

Web links

Individual evidence

  1. Where did #datavault get it's name? .
  2. Data Vault Series 1 - Data Vault Overview .
  3. ^ The new evolution of data modeling .
  4. A short intro to #datavault 2.0 .
  5. ^ John Giles, The Elephant in the Fridge . Basking Ridge 2019, ISBN 978-1-63462-489-3 .
  6. ^ Data Vault in the Netherlands .
  7. Modeling to Support Agile Data Warehouses: Hyper Normalization and Hyper Generalization .
  8. ^ Ensemble Modeling .
  9. ^ Hans Hultgren: Modeling the Agile Data Warehouse with Data Vault . Denver et al. 2012, ISBN 978-0-615-72308-2 , pp. 21-22 .
  10. Daniel Lindstedt, Michael Olschimke: Building a Scalable Data Warehouse with Data Vault 2.0 . Waltham 2016, Chapter 4.3.
  11. Daniel Lindstedt, Michael Olschimke: Building a Scalable Data Warehouse with Data Vault 2.0 . Waltham 2016, Chapter 4.4.
  12. Daniel Lindstedt, Michael Olschimke: Building a Scalable Data Warehouse with Data Vault 2.0 . Waltham 2016, Chapter 4.5.
  13. Data Vault - the revolutionary data warehouse modeling? . Blog post by Markus Bellmann, (linkFISH Consulting GmbH) from January 19, 2015. Now you can easily model the Data Vault . 6-part webcast series on Data Vault by Michael Müller (MID GmbH) from October 2014. Data modeling with Data Vault & ETL in the Data Vault tables and in the Data Mart dimensions . Blog post by Claus Jordan from October 15, 2013.