Parallel database system

from Wikipedia, the free encyclopedia

A parallel database system is a database that is implemented in a parallel computer.

classification

There are three basic architectures of parallel database systems (Stonebraker classification).

  • Shared Everything (SE): The main memory and the disk subsystem are shared by processors.
  • Shared Disks (SD): Only the disk subsystem is shared by processors. Each processor node has its own main memory.
  • Shared Nothing (SN): There are no shared resources.

This classification was later expanded to include other hierarchical types of architecture. These include, for example:

  • Clustered Everything (CE): Two-tier, hierarchical architecture that consists of several SE clusters that are connected to one another according to the shared nothing principle.
  • Clustered Disk (CD): Two-layer, hierarchical architecture consisting of several SD clusters that are connected to one another according to the shared-nothing principle.
  • Clustered Disks Nothing (CDN): Three-tier, hierarchical architecture consisting of a set of SD clusters in the lower tier and a set of SN clusters in the upper tier. The individual SN clusters are connected to one another according to the SN principle.

conditions

Typical requirements of parallel database systems include:

  • Short response times with high throughput
  • High availability
  • Good scalability
  • Efficient load balancing
  • Low interprocessor communication
  • As little effort as possible for cache coherency control
  • Efficient synchronization of (global) access conflicts
  • High cost efficiency

Optimization possibilities

The most important optimization options are the parallelism for hardware and software components. Parallel database systems enable database processing on parallel computer systems so that the processing capacity of numerous processors can be used to increase performance. The database parallelism enables a parallelization of transactions up to individual partial operations. In addition, query optimization algorithms and load balancing allow the (parallel) query execution to be accelerated. An important advantage of parallel database systems is that when bundled high-performance standard hardware (especially microprocessors) is used, efficient data processing and thus high cost efficiency can be achieved. At the same time, the availability of the database system can be increased after a failure of the individual computers in the computer network. Research has shown that the SN architecture can provide the best performance among the base architectures. The hierarchical CDN architecture enabled the best performance of the considered parallel database systems in the investigations.

literature

  • LB Sokolinsky: Survey of Architectures of Parallel Database Systems. In: Programming and Computer Software. Springer Netherlands, vol. 30, November 6, 2004.
  • E. Rahm: Multi-computer database systems - basics of distributed and parallel data processing. Addison-Wesley, Bonn 1994. Available online

Individual evidence

  1. Thomas Kudraß: Paperback databases. 2007, p. 394.