Data virtualization

from Wikipedia, the free encyclopedia

The term data virtualization covers certain approaches in the area of data management as a subset of data integration . These make it possible to query and manipulate data from source systems without the detailed technical information - such as the structure of the data source or the physical storage location - having to be known to the querying system.

The data virtualization can be seen as an alternative to the data warehouse approach with its ETL processes , in which the data is extracted from the source systems, transformed and finally loaded into the analytical system. In contrast, the data remain in their original systems; the virtualization component accesses this data directly and makes it available for further manipulation or consumption by other applications.

In order to eliminate the heterogeneity of the data (differences in data sources, format and semantics), various abstraction and transformation techniques are used.

Possible advantages of this approach are the reduction of incorrect data and - with a corresponding design of the virtualization component - a lower utilization of the systems involved. There is also the option of writing data back to the source systems.

Typical areas of application of the concept and the corresponding software are in business intelligence , in the area of service-oriented architecture , in cloud computing , in enterprise search and master data management .

Data virtualization and data warehousing

Many enterprise system landscapes consist of disparate data sources, including multiple data warehouses, data marts and / or data lakes . The data virtualization can build a bridge over these source systems without having to carry out additional physical data storage. The existing data infrastructure can continue to perform its core functions, while the data virtualization layer only uses the data from these sources. This aspect can help increase data availability and usage.

Data virtualization can also be viewed as an alternative to ETL processes and data warehousing. The concept aims to provide insights from several data sources quickly and in good time without the need for extensive ETL processes and additional data storage. However, data virtualization can be expanded and adapted to meet data warehousing requirements as well. This requires an understanding of the data storage requirements and historization, along with planning and design in order to select suitable data virtualization, integration and storage strategies as well as undertake infrastructure / performance optimizations (e.g. streaming , in-memory , hybrid storage) to be able to.

Examples

  • The Phone House - the trading name for the European branches of the British mobile phone retail chain Carphone Warehouse - implemented Denodo's data virtualization technology between the transaction systems of its Spanish subsidiary and the web-based systems of the mobile network operators.
  • Novartis implemented a data virtualization tool from Composite Software, enabling its researchers to quickly combine data from internal and external sources into a searchable virtual data store
  • Primary Data (now Hammer.space) was a virtualization platform that enabled applications, servers and clients to transparently access data while it was intelligently moved between direct attached storage , network attached storage , private and public cloud storage.
  • Linked data can use a single hyperlink-based Data Source Name ( DSN ) to provide a connection to a virtual database layer, to which in turn various data sources via ODBC , JDBC , OLE DB , ADO.NET , SOA services, and / or REST are connected.
  • Database virtualization can use a single ODBC-based DNS to provide a connection to a virtual database tier.

Functions

Data virtualization solutions provide a choice or all of the following features:

  • Abstraction - Abstracting the technical aspect of the stored data such as storage location, storage structure, API, query language, and storage technology
  • Virtualized data access - access to various data sources and making the data available at a common logical access point
  • Transformation - transformation, data quality improvements, reformatting, aggregation of the source data
  • Data federation - combination of result sets from several source systems
  • Data delivery - Publication of result sets as views and / or data services that can be accessed by client applications or users

In addition, software for data virtualization can contain functions for development, operation and / or administration

The following advantages can be achieved with the correct application of the concept of data virtualization:

  • Reduction of bad data
  • Reduction of the system load by keeping the data in the source system
  • Increased access speeds
  • Reduction of the time required for development and support
  • Increased governance and reduced risk through the application of guidelines
  • Reduction of memory requirements

Possible disadvantages are:

  • Operational systems could be affected in their response times. Especially when they can't handle unexpected queries.
  • Data virtualization does not enforce a heterogeneous data model, it means that the user has to interpret the data unless it is combined with data federation and the business understanding of the data.
  • Data virtualization requires a defined governance approach in order to avoid budgeting problems with the shared services.
  • Data virtualization is not suitable for historizing data. A data warehouse is better suited for this.
  • Change management is associated with increased effort, since all changes to the virtual data model must be accepted by all consuming applications and users.

technology

Some data virtualization solutions and providers:

history

Enterprise Information Integration (EII) (first mentioned by Metamatrix) and Federated Database Systems are terms used by some vendors to describe a core element of data virtualization: the ability to create relational joins in a federated view.

literature

  • Judith R. Davis and Robert Eve: Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility .
  • Rick van der Lans: Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses .
  • Anthony Giordano: Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture .

Individual evidence

  1. ^ "What is Data Virtualization?" , Margaret Rouse, TechTarget.com, accessed August 19, 2013
  2. Data Virtualization - dataWerks ( en-US ) Archived from the original on April 10, 2018. Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. Retrieved September 14, 2018. @1@ 2Template: Webachiv / IABot / www.datawerks.com
  3. a b "Data virtualization on rise as ETL alternative for data integration" Gareth Morgan, Computer Weekly, accessed on August 19, 2013
  4. ^ "Rapid Access to Disparate Data Across Projects Without Rework" Informatica, accessed on August 19, 2013
  5. Data virtualization: 6 best practices to help the business 'get it' Joe McKendrick, ZDNet, October 27, 2011
  6. "IT pros reveal benefits, drawbacks of data virtualization software" Mark Brunelli, SearchDataManagement, October 11, 2012
  7. ^ A b c "The Pros and Cons of Data Virtualization" Loraine Lawson, BusinessEdge, October 7, 2011
  8. https://capsenta.com/
  9. http://querona.com/
  10. https://www.tibco.com/products/data-virtualization