MySQL cluster
MySQL Cluster is a storage engine of the free database system MySQL in the currently available version 7.2. It enables the database to be installed on a computer cluster built in a shared nothing architecture . This means that each computer node has its own hard drives and RAM. If the individual computer nodes are equipped with a sufficiently large main memory, then all data can be kept in the main memory (up to version 5.0 this was mandatory, from version 5.1 there are also disk-resident tables).
According to its own information, the MySQL cluster technology offers an availability of 99.9999%. That means an annual downtime of less than six minutes.
The NDB storage engine (Network Database Storage Engine) is an independent component that enables persistent storage of data and is responsible for coordinating all accesses to data nodes in a MySQL cluster. Applications can access the NDB storage engine directly via various NoSQL interfaces or via a MySQL node via SQL.
NDB Cluster was taken over by MySQL AB in 2003 with the acquisition of Alzato (an Ericsson subsidiary). In 2008, MySQL AB was acquired by Sun Microsystems . In 2010, Sun was taken over by Oracle.
Areas of application
MySQL Cluster is often used as a DBMS (Database Management System) in the web environment, where it is important to achieve a lot of read accesses in combination with a high level of reliability. For such requirements, MySQL Cluster has already proven better access times than Oracle, DB2 and MS SQL in tests.
architecture
There are three types of nodes in the MySQL Cluster:
- Data node (Ndbd)
- Data nodes store all data belonging to MySQL Cluster. The data is normally replicated between the data nodes in the cluster to ensure that it is continuously available in the event of one or more nodes failing. Data nodes also manage database transactions . With more than two data nodes in the node, so-called Nodegroups (node group) divided. A node group must consist of at least two data nodes. The data is replicated within a node group. When inserting a new record, the system creates a hash from the primary key (the NDB engine automatically generates a primary key if none is defined). The value of the hash determines in which node group the record is stored. This results in a statistical uniform distribution. This method of distributing data is also known as partitioning .
- Data storage: Up to version 5, all data was kept in main memory and periodically written to disk. Roughly speaking, this meant that the data nodes had to have as much main memory as the size of the database; the supplied script "ndb_size.pl" made it possible to estimate the probable memory requirement for an existing database. From version 6 onwards there is the "storage disk" table type, in which index fields are stored in the main memory and the remaining fields are stored on the disk.
- Roles: Exactly one Ndb node in the cluster is the master . This role is assigned by the arbitrator at startup. The master has the primary information about the state of the cluster. Without a master, the cluster is unable to work. In the event of a failure, another node can take over this role - but only with the consent of the arbitrator or the absolute majority of the Ndb nodes. If parts of the cluster lose their connection but are still able to work ( split brain scenario ), the arbitrator decides who should be the master. By default, the manager is the arbitrator; however, it is possible to configure a hierarchy; an odd number is often chosen in order to maintain majority ability. Management or SQL nodes can serve as arbitrators.
- Management node
- A management node is responsible for system configuration, node administration and recording of activities in the cluster. One or more management nodes can be used at the same time for reasons of availability. This node is only required during operation if another failed node wants to log on to the cluster again. If the management node is also an arbitrator (default), it must also be available if the master fails. Since failures cannot be foreseen, this node should always be running.
- SQL node
- A SQL node corresponds to a MySQL database system that can communicate with data nodes. The SQL nodes can be addressed individually or via load distribution under a collective IP.
The MySQL database system allows the use of database management systems with different concepts: with and without execution of transactions, with and without persistent storage, with and without the use of stored procedures, with synchronous or asynchronous replication, etc.
The rough process of a user request is as follows:
- A request is made to an SQL node.
- The SQL node forwards the queries to the data nodes.
- A data node processes the request and sends the result back to the SQL node.
- The SQL node transfers the result to the requesting object.
At least one process is started on each of the nodes of the MySQL cluster. For SQL nodes the responsible process is called mysqld , for data nodes it is ndbd and for management nodes it is ndb-mgmd . Several MySQL Cluster processes can run simultaneously on computer nodes with several processors. For example, two ndbd processes can run in parallel on a data node with two CPUs. It is also possible to use processes of different MySQL Cluster node types on one computer node with several CPUs. For example, a process of the SQL node (mysqld) and a process of the data node (ndbd) can be started on a computer.
Ports (default settings):
- Sql node
- 3306
- Manager
- 1186
- Ndbd knot
- do not use fixed ports. The ports are dynamically assigned by the manager when the cluster is started and propagated to the servers in the cluster.
Security : The data traffic within the cluster is not cryptographically secured. It is the operator's responsibility to shield the cluster, e.g. B. through firewall rules or your own tunnel network.
Platforms
MySQL supports Windows, Unix / Linux. A Mac version is available for development.
Individual evidence
- ^ MySQL website
- ↑ Computer Week . No. 45 , 2006, p. 24 .
- ↑ MySQL characters are set to Enterprise ( Memento of the original from July 14, 2012 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. , Computer Week 45/2006
- ↑ Larissa Janssen: High-Performance Database Systems: Theory and Practice . Books on Demand GmbH, 2008, ISBN 3-8334-9326-7 , pp. 188-189 .