Data profiling

from Wikipedia, the free encyclopedia

Data profiling refers to the largely automated process for analyzing existing data sets (e.g. in a database ) using different analysis techniques. The data profiling validates the existing metadata for the real data and identifies new metadata. In addition, existing data quality problems are validated, the causative data identified and the information quality of the analyzed data measured. Data profiling does not fix any quality problems in the data, it only corrects the associated metadata.

The data profiling process

The data profiling analysis is an iterative process that takes place in the following four individual steps (cf. Apel et al. 2010, p. 110):

  1. Integrate data,
  2. analyze integrated data,
  3. Display results and
  4. evaluate professionally.

Data profiling process

The various data profiling methods can be divided into attribute, data set and table analysis. In the attribute analysis, all values ​​in a table column (= attribute) and the properties of the attributes of a table are examined, in the data record analysis all data records in a table and in the table analysis all relationships between different tables. There are many different data profiling methods for each of these three types of analysis.

literature

  • Detlef Apel, Wolfgang Behme, Rüdiger Eberlein, Christian Merighi. Control data quality successfully . 2nd edition 2010, Hanser Fachbuch, ISBN 978-3-446-42501-9 .