Hubs and Authorities

As hubs and authorities , nodes that are outstanding in network theory can be classified on the basis of their links. Put simply, hubs and authorities are nodes that are connected to many other nodes - for example, well-known personalities in social networks and link directories on the World Wide Web .

calculation

Similar to the PageRank algorithm, the concept of hubs and authorities provides a concept for the automatic assessment of websites on the basis of their links, with which a ranking procedure can be specified. It was proposed by Jon Kleinberg in 1999 and is known as hypertext-induced topic selection (HITS).

Each page is rated according to two categories:

Hubs are pages that point to many documents with valuable content.
Authorities are pages whose content is considered particularly good.

The algorithm assumes that good hubs have hyperlinks to many authorities and that authorities can be reached from many hubs.

For evaluation, each page is assigned a stroke weight and an authority weight from a basic set of pages . The basic amount is generated from the search query. For this purpose, pages that match the search terms are extended by a certain number of pages that are linked from the basic set or that point to the basic set. The weights are then updated as follows until convergence is established: ${\ displaystyle i}$ ${\ displaystyle i = 1, \ ldots, n}$ ${\ displaystyle h_ {i}}$ ${\ displaystyle a_ {i}}$

{\ displaystyle h_ {i} \ leftarrow \ delta \ sum _ {j = 1} ^ {n} A_ {ij} \, a_ {j}}

{\ displaystyle a_ {i} \ leftarrow \ lambda \ sum _ {k = 1} ^ {n} {A ^ {T}} \! _ {ik} \, h_ {k}}

Here is the link matrix in which , if the page has a link to the page , and if this is not the case. is the transposed matrix of , i.e. H. . The following applies: ${\ displaystyle A}$ ${\ displaystyle A_ {ij} = 1}$ ${\ displaystyle i}$ ${\ displaystyle j}$ ${\ displaystyle A_ {ij} = 0}$ ${\ displaystyle A ^ {T}}$ ${\ displaystyle A}$ ${\ displaystyle {A ^ {T}} \! _ {ij} = A_ {ji}}$

The hub value of a page results from the sum of all authority values of the pages that are linked by. ${\ displaystyle i}$ ${\ displaystyle i}$
The authority value of a page is the sum of all hub values of the pages that link to. ${\ displaystyle i}$ ${\ displaystyle i}$

By substituting the definitions one gets the following dependencies:

{\ displaystyle h \ leftarrow \ delta \, \ lambda \, AA ^ {T} \, h \,}

{\ displaystyle a \ leftarrow \ delta \, \ lambda \, A ^ {T} \! \! A \, a \,}

Here, and converge against one of the eigenvectors to the greatest eigenvalue of or . ${\ displaystyle h}$ ${\ displaystyle a}$ ${\ displaystyle AA ^ {T}}$ ${\ displaystyle A ^ {T} A}$

${\ displaystyle \ delta}$ and are mostly normalizations to the unit circle. In addition, or are each symmetrical and positive semidefinite . This means that both matrices can be diagonalized and therefore have an orthonormal basis . The repeated multiplication thus converges towards the largest eigenvector. ${\ displaystyle \ lambda}$ ${\ displaystyle AA ^ {T}}$ ${\ displaystyle A ^ {T} A}$

See also : scale-free network

literature

Jon Kleinberg : Authoritative sources in a hyperlinked environment . In: Journal of the ACM . 46, No. 5, 1999, pp. 604-632. doi : 10.1145 / 324133.324140 .

Web links

Kleinbergs Hubs & Authorities at drweb.de