Data mapping
Data mapping is the process of mapping data elements between different data models . Data mapping is required as a first step for various information integration tasks :
- Data transformation or data mediation between a data source and a data destination. For example, data mapping could be used to exchange purchase and billing information between different companies. The data of one company is mapped, for example, according to standardized ANSI ASC X12 messages.
- Identification of data relations as part of the analysis of data sources
- Detection of hidden, sensitive data such as parts of the social security number in IDs of anonymized data
- Aggregation of different databases into a single database
- Find redundant information for consolidation or elimination
Standards
ANSI ASC X12 or EDIFACT are generic standards that enable companies from different industries to exchange data with one another.
techniques
Data mapping can be implemented algorithmically in different ways. This includes, for example, implementation using procedural code, the use of XSLT transformations or using graphical mapping tools that create automatically executable transformation programs.
Graphical tools enable the user to draw lines between fields in one data structure and fields in the other data structure. The tools can also automatically recognize the relationships between the fields based on their names and value ranges at the push of a button. These programs automatically generate SQL, XSLT or program code, for example in Java or C ++, from the defined relationships . Such tools are usually part of ETL tools.
Semantic mapping is similar to the automatic relationship recognition of the graphical tools mentioned above with the addition that a metadata dictionary is used to identify synonyms . For example, if one data source lists places of residence , the other data source lists whereabouts , semantic mapping recognizes that these describe the same thing if place of residence and whereabouts are listed as synonymous in the metadata directory. However, semantic mapping only recognizes exact synonyms and, for example, no transformations between place of residence and zip code.
There are also program libraries that support the mapping of data in memory. Dozer and ModelMapper are examples of this.
Data-driven mapping is a newer approach. By means of parallel evaluation of the data values of two data sources, data-driven mapping tries to automatically recognize complex mappings between the two data sources based on heuristics and statistics. This approach recognizes, for example, data parts, data merges or arithmetic relations. This approach also recognizes exceptions that do not correspond to the recognized mapping logics.
See also
Individual evidence
literature
- Bogdan Alexe, Laura Chiticariu, Renée J. Miller, Wang Chiew Tan: Muse: Mapping Understanding and deSign by Example. ICDE 2008: 10-19 doi : 10.1109 / ICDE.2008.4497409
- Khalid Belhajjame, Norman W. Paton, Suzanne M. Embury, Alvaro AA Fernandes, Cornelia Hedeler: Feedback-Based Annotation, Selection and Refinement of Schema Mappings for Dataspaces (PDF; 450 kB). EDBT 2010: 573-584
- Laura Chiticariu, Wang Chiew Tan: Debugging Schema Mappings with Routes (PDF; 218 kB). VLDB 2006: 79-90
- Ronald Fagin, Laura M. Haas, Mauricio A. Hernández, Renée J. Miller, Lucian Popa, Yannis Velegrakis: Clio: Schema Mapping Creation and Data Exchange. Conceptual Modeling: Foundations and Applications 2009: 198-236, doi : 10.1007 / 978-3-642-02463-4_12
- Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, Lucian Popa: Data exchange: semantics and query answering . Theor. Comput. Sci. 336 (1): 89-124 (2005) doi : 10.1016 / j.tcs.2004.10.033