Graph database

from Wikipedia, the free encyclopedia

A graph database (or graph-oriented database ) is a database that uses graphs to represent and store highly networked information. Such a graph consists of nodes and edges , the connections between the nodes. The two best-known concepts for graph databases are the Resource Description Framework (RDF) and Labeled-Property Graph (LPG).

Graph databases belong to the NoSQL databases and, in contrast to relational database systems, prioritize the relationship between the data and thereby simplify the mapping of hierarchical and networked structures. With the help of special query languages ​​such as Gremlin , SPARQL and Cypher , graph databases enable, for example, the query of complex patterns, the traversing of graphs and the determination of the shortest path between two nodes. With a specialized graph database, known graph structures such as cliques or hotspots can be identified much more easily in a graph.

Basics

A simple example of a graph are relationships between people (see also sociogram ). The nodes represent people; the name of the person is assigned to each node. The edges represent relationships; they are distinguished by a type ( knows , loves , hates ).

Graph-database-example1.png

The graph above is a directed, named multigraph . The edges labeled knows are generally symmetrical, whereas this is not necessarily true for the relationships love and hate .

Another example is network connections . Each node corresponds to a computer, switch, or router. Every edge of a connection. Every connection has a bandwidth. In this case one speaks of weighted graphs.

Graph models

Labeled property graph

In a labeled property graph or simply property graph, both nodes and edges can have properties, so-called properties (for example weight: 10 kg, color: red, name: Alice). This specialization in property graphs distinguishes them from the classic data models of the relational databases .

Resource Description Framework

In the RDF (Resource Description Framework) graphs are represented with the help of triples. A triple consists of three elements in the form of knot-edge-knot (subject - predicate -> object), which are defined as resources in the form of a globally unique URI or as an anonymous resource. In order to manage different graphs within a database, the triples are saved as quads. A quad adds a reference to the associated graph to each triple. The above example can be represented as follows:

 Alice --kennt-> Bob
 Bob  --hasst-> Alice
 Bob  --hasst-> Dave
 Carol --kennt-> Alice
 Carol --liebt-> Dave
 Dave --liebt-> Carol

With the help of these simple building blocks from (subject - predicate -> object), very complex graphs can be developed which, with appropriate modeling, also enable automatic conclusions . Based on RDF, a vocabulary was developed with the RDF schema to formalize weak ontologies and, in addition, the Web Ontology Language can also be used to describe completely decidable ontologies. The W3C recommends the use of SHACL to ensure data quality and compliance with a schema in complex graph structures.

RDF is widely used in many web technologies, such as RSS and the semantic web .

SPARQL is a standardized query language in the Semantic Web, with the help of which RDF graphs can be generated, modified and queried.

Demarcation

Relational databases

Relational databases manage relations (tables) and tuples (rows). Each row in a table is a record. Each row consists of a series of attribute values ​​(properties), the columns of the table. The relation scheme defines the number and type of attributes for a relation. Relationships between tables are implemented using keys.

With SQL, there is a uniform query language for relational database management systems. SQL allows the selection of lines with certain properties.

It is possible to display graphs in relational databases. For the above example of a social network is chosen for the people a table PERSON and for the edges of a table RELATIONSHIP . With SQL all nodes (persons) or edges (relationships) with given properties can be found. Recursive Common Table Expressions can be used to find all indirect acquaintances or to determine a path between two people (ANSI-SQL 99, DB2 , Oracle 11gR2, PostgreSQL, SQL-Server 2008). This allows unidirectional and bidirectional graphs to be searched. If the table with the edges also contains a weighting, the optimal (shortest) path between two nodes can also be determined with an SQL query.

In contrast, graph databases use higher-performance traversal algorithms to select certain nodes. Starting from one or more nodes, all or selected outgoing edges are traversed.

Object-oriented models

With the advent of object-oriented programming languages, object databases were increasingly offered. Objects from object-oriented languages can thus be held directly in the database. This approach has advantages over relational design if you want to save complex data objects that are difficult to map onto the flat relational table structures. Graphs can be mapped in object databases by keeping the outgoing edges as a list of target nodes. With this procedure, however, it is not possible to assign properties to the edges themselves.

Algorithms

Important algorithms for querying nodes and edges are:

Query languages

So far there is no standard for querying graph databases. This has led to a large number of different query languages ​​and query options. Important representatives are

  • Blueprints - a Java API for property graphs that can be used in conjunction with various graph databases.
  • Cypher - a query language developed by Neo4j .
  • GraphQL - a SQL-like query language
  • Gremlin - an open source graph programming language that can be used with various graph databases (Neo4j, OrientDB, DEX).
  • GReQL - a textual graph query language for property graphs, offers computation of regular paths through path expressions.
  • Pipes - a data flow framework for Java based on process graphs especially for query processing on property graphs.
  • Rexster - an HTTP / REST interface for access to graph databases via the Internet, which is supported by several manufacturers.
  • SPARQL - query language specified by the W3C for RDF data models

See also

Web links