Barabási-Albert model

from Wikipedia, the free encyclopedia
Three graphs with 20 nodes each, created with the Barabási-Albert model. The parameter (number of edges of a newly added node) as indicated and the color scaling corresponds to the degree of each node (scale is identical for each graph).

The Barabási-Albert model ( English Barabási – Albert (BA) model ) describes a stochastic algorithm from the field of graph theory for generating undirected, scale - free networks . The model was formulated by Albert-László Barabási and his doctoral student Réka Albert and its main features are the successive growth of the network, i.e. the addition of new nodes over time and their connection to the existing network. The latter is a random process , but the so-called preferential binding ( English preferential attachment subject). The selection of the neighbors of a new node is more likely to be decided in favor of nodes that already have a high degree. In the networks that are created in this way, there are accordingly some relatively important nodes ( English hubs ), the grades of which are significantly higher than those of the overwhelming majority with comparatively low grades. One then speaks of a scale-free network because the degree of distribution ( English degree distribution ) a power law follows; the character of such a network is therefore independent of its size. It is considered to be the most popular model for generating networks.

concept

Many real networks, such as the Internet , have a degree distribution with heavy margins , which Herbert A. Simon recognized as early as 1955 . As a result, the degree distribution is very heterogeneous and there is a high probability that there are few extremely well networked nodes whose degree is significantly higher than the mean value . This is also the case in scale-free networks , but whether the degree distribution from empirical data is actually subject to exactly a power law is relatively difficult to prove. Nevertheless, scale-free networks are very similar to empirical networks in several respects, which is at least clear from the mechanisms of their formation. So is z. For example, in the classic random graph based on the model by Paul Erdős and Alfréd Rényi (Erdős – Rényi model), the number of nodes is fixed. Such an ER random graph has a finite, firmly defined size and this is not variable over time. Because all possible edges are realized with the same probability, nodes are in principle equivalent and their degree scatters relatively closely with a Poisson distribution around the mean. Real networks, however, are variable over time and clearly show a high degree of variance in their degree distribution. For example, new pages are constantly being added to the World Wide Web , which are very likely to be linked to existing, "important" pages. The importance of a page can be determined based on the number of links it has, i.e. its degree (see PageRank ). Scale-free networks are also found in the Web of Trust of Pretty Good Privacy . Likewise, in a network of scientific publications (nodes) and their referencing (edges), the probability is higher that a new publication will refer to already existing, reputable or highly regarded publications than to unknown ones.

The BA model explicitly formulates these two properties of real networks for the creation of similar random networks. The core elements are therefore:

  1. Growth: The network grows by one node in each time step, which is connected to a fixed number of existing nodes.
  2. Preferred connection: The selection of the newly linked connections depends on the importance or degree of the nodes in question.

Mathematical formulation

Creation of a graph using the Barabási-Albert model. The number of edges added in each step is in this example .

One starts with a network with nodes, whereby the edges can be chosen arbitrarily, as long as each node has at least one neighbor. In each time step, a new node is added that is connected to existing nodes. The probability with which a node is selected for this is proportional to its degree . Formally, this can be written as

,

where is the number of current nodes.

This gives the network size with and the number of edges as , where represents the initial number of edges .

Problems arise due to the fact that it is not defined in the original model how exactly the initial edges are distributed and whether the newly added edges are distributed simultaneously or successively. If the assignment of the new edges is independent, there is the possibility of multiple assigned edges. It also remains unclear whether loops (edges whose two end nodes are one and the same) are allowed or not. A mathematically concrete definition that eliminates this ambiguity was proposed by Bollobás et al. proposed under the name Linearized Chord Diagram (LCD).

properties

Degree dynamics

Since the degree of a node is time-dependent, it is worth considering the dynamics for the BA model. If one assumes approximately that this represents a kind of expected value (in fact it is always a natural number ). The rate at which a node accumulates new edges is thus

.

Simplifying for great times results in integration

,

where represents the time at which the node is added and .

One consequence of this dynamic is that the competition for new edges becomes stronger with increasing network size and thus the rate of accumulation decreases with increasing time. There is also a first-mover advantage , older nodes tend to have more new edges.

Degree distribution

The distribution of degrees of a Barabási-Albert network with 200,000 nodes and . The maximum degree corresponds to .

The degree distribution of a network generated with the BA model roughly follows a power law of the form

.

Approximatively (especially for large ones ) this can be reduced to

.

The LCD method provides an exact solution

.

In both cases, the scale-invariant character of the model becomes recognizable, since the degree distribution is independent of both the network size and the point in time.

Typical parameters

Medium grade

Each node has an average degree (average number of connections to other nodes) of

,

whereby this statement is insignificant for individual, randomly selected nodes, since the variance increases indefinitely for large networks.

diameter

The diameter , i.e. the longest of the shortest paths between all nodes, of a BA random graph is approximate

.

Cluster coefficient

The typical cluster coefficient of a BA random graph is approximately

.

history

The first observations of distributions that follow a power law go back to the economist Herbert A. Simon . His work from 1955 referred, among other things, to the distribution of wealth , but not explicitly to networks. He was able to show that profits from investments are greater, the more originally invested: The principle of the rich get richer thus leads to a distribution of wealth in the form of a power law. Simon called this mechanism the Yule process because George Udny Yule had made similar investigations several years earlier. In the 1970s, the specific cause was raised again by Derek de Solla Price . He took over the developed by Simon mechanisms with few changes, transferred them to networks and called the principle of cumulative advantage ( English cumulative advantage ). His work mainly related to a network of scientific publications and he developed the Price-Modell ( English Price's model ), which can reproduce their actual distribution using relatively few assumptions and simple mechanisms.

Barabási and Albert called the same effect that Simon and Price had already worked out, preferential attachment . Their model was developed independently of the Price model in the 1990s and is similar to it, but still differs in some key features. In both cases, the network grows in the process of creation, but - unlike Price - a fixed number of edges are added to each new node in the BA model. The smallest possible degree therefore corresponds exactly to this number. In addition, edges are viewed as undirected in the BA model. Today, the BA model is the best known model for generating networks.

literature

Individual evidence

  1. André Krischke, Helge Röpcke: Graphs and network theory: Basics - Methods - Applications. Carl Hanser, 2014, ISBN 978-3-446-44184-2 , pp. 174-181.
  2. ^ A b c Mark Newman : Networks. 2nd Edition. Oxford University Press , New York 2010, ISBN 978-0-19-920665-0 .
  3. ^ A b Herbert A. Simon : On a Class of Skew Distribution Functions . In: Biometrika . Volume 42, number 3, 1955, pp. 425-440, doi: 10.2307 / 2333389 .
  4. A. Clauset, CR Shalizi, MEJ Newman: Power Law Distributions in Empirical Data . In: SIAM Review. Volume 51, number 4, 2009, pp. 661-703, doi: 10.1137 / 070710111 .
  5. ^ Paul Erdős , Alfréd Rényi : On random graphs I. In: Publicationes Mathematicae. (Debrecen) 6, 1959, pp. 290-297.
  6. ^ Albert-László Barabási , Eric Bonabeau: Scale-Free Networks. In: Scientific American . Volume 288, Number 5, 2003, pp. 60-69, JSTOR 26060284 .
  7. Oliver Richters, Tiago P. Peixoto: Trust Transitivity in Social Networks. In: PLOS ONE . 2011. doi: 10.1371 / journal.pone.0018384 .
  8. Mingyang Wang, Guang Yu, Daren Yu: Measuring the preferential attachment mechanism in citation networks. In: Physica . A. Volume 387, Number 18, 2008, pp. 4692-4698, doi: 10.1016 / j.physa.2008.03.017 .
  9. S. Speaker: How popular is your paper? An empirical study of the citation distribution. In: The European Physical Journal . B. Volume 4, Number 2, 1998, pp. 131-134, doi: 10.1007 / s100510050359 .
  10. B. Bollobás, O. Riordan, J. Spencer, G. Tusnády: The degree sequence of a scale-free random graph process. In: Random Structures and Algorithms. Volume 18, 2001, pp. 279-290, doi: 10.1002 / rsa.1009 .