URL hijacking

URL hijacking is the hijacking of a domain from the index of various search engines . This problem is based on a misunderstanding between a website and a search engine regarding (especially dynamically generated) redirects . The consequences for the hijacked page are fatal: It no longer appears in the search results and no longer receives any visitors from the relevant search engines.

Technical background

Problem of permanent and temporary redirects

There are various options on the Internet for forwarding inquiries to a specific address to another address . An example: If you call https://de.wikipedia.org/up, you will be https://de.wikipedia.org/wiki/Wikipedia:Hauptseiteforwarded to. Such forwardings pursue a wide variety of goals, for example:

Permanent redirection to the correct address on the main page (as in the example mentioned).
Permanent redirection to the correct domain in the event of typing errors (example: → ) or after a domain change / relocation.googel.degoogle.de
Permanent forwarding when content has been given a new file name (example: /home.htmlmeans from now on /index.html).
Temporary forwarding if content can initially be found at a different address, but in the future again at the one called up or possibly a completely different address.

Two HTTP status codes have been defined in HTTP 1.0 for the two main types of such redirects (permanent, called “permanent”, and temporary, called “temporary”) : 301 (Moved Permanently) for permanent redirects and 302 (Found) for temporary redirects. A third possibility is forwarding via a so-called meta-refresh, from which it is not clear whether the forwarding is permanent or temporary.

Search engines usually adhere exactly to the definitions of the HTTP standard. If address A refers to address B by means of permanent forwarding , the search engine assumes that the content will always be found under address B in the future. As a result, address B will be included in the search engine's index, while address A will be deleted (or not included) from it. As a rule, this is also the desired effect. The second variant is problematic with regard to URL hijacking. If address A refers to address B by means of a temporary forwarding , the search engines assume that the content can currently be found at address B, but will (again) be found at address A in the future. As a result, address A will be included in the search engine's index, Address B is deleted or not added at all. This effect is desired if the content actually only has a different address temporarily (see point 4 in the list above), but undesirable if address B is actually the correct one.

URL hijacking technique

Temporary redirects pose a problem at the moment when this “temporary address change” does not correspond to the facts. If address A refers to address B with a temporary redirect, although the two addresses have nothing to do with each other (for example, address A belongs to a web directory , address B is a website registered there), search engines assume that page B will eventually return to address A. will be accessible, because search engines can neither recognize what type of website it is (a directory) nor that the wrong type of redirect was used here.

example

The following example explains from a search engine's point of view why a hijacked address is removed from the index.

The search engine indexes website A
You will find a link to page A1
She calls A1 and receives the answer that it is a temporary forwarding to page B1
It will save the address of A1 as this is supposed to be the correct address
It will remove B1 from its index, since the page is supposedly only temporarily accessible there

Error or standard conformity?

The fact that URL hijacking is even possible is often referred to as a program bug in the search engines. However, RFC 2616 , Section 10.3.3 requires:

Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.

Since the forwarding can occasionally be changed, the client [here: the search engine] should continue to use the request URI [ie the original address] for future requests. This response [i.e. the new address] can only be cached if this is indicated by a Cache-Control or Expires header field.

As a result, search engines actually act in accordance with the standards, since they have to assume that the address called is actually an outdated address. Nevertheless, there is a security hole in the search engine, as webmasters could influence other websites.

trigger

There are various reasons why webmasters create redirects to third-party content instead of directly linking them. The forwarding program (the dereferrer ) can take on various tasks, for example counting the clicks on the relevant link or disguising security-relevant information such as the session ID of the current website.

Dynamic links, which can usually be recognized by a "?" In the URL (e.g. http://www.example.com/?id=12345) and which are generated from appropriate databases, are particularly dangerous. The popular scripting language PHP uses 302 redirects by default, unless the programmer explicitly specifies a status code. The search engine Yahoo is known that it evaluates a meta refresh as a 301 redirect if the redirect delay is only very short; if the delay is longer, 302 is assumed.

Many webmasters are not aware that they use such redirects themselves and thus can cause elementary damage to other sites. In some cases of hijacking, the redirects are misused on purpose, which can cause the original page to lose rankings or even disappear from the index. This procedure is particularly typical for so-called black hat SEO. In this way, your own website should increase in the search engine ranking. Such hijacking practices count as a criminal act and can be punished.

Hijacking is more likely, the higher the PageRank of the linking website is compared to the "victim page". A high PageRank increases the trustworthiness of the linking website, so that search engines assume that the standards have been correctly applied and that the address change is actually a temporary one. Furthermore, the website with the higher PageRank is viewed as more relevant and therefore remains in the index anyway.

Possible solutions

A site query (site: Meine-seite.de) can be used to determine whether a URL is a victim of URL hijacking. This shows the hijacking page. Another possibility is a cache query (cache: http: //www.meine-domain.de). Again, the hijacking page is displayed instead of the original domain.

The additional status code 307 (Temporary Redirect) was introduced in HTTP 1.1. This identifies temporary forwarding in which the old address remains valid.

Webmasters who use redirects themselves should - if they redirect to external content and it is really necessary to use a redirect at all - always forward using the status code 301 Moved Permanently in order to avoid accidental URL hijacking. Redirects that are automatically generated by a content management system (CMS) should be checked accordingly using a service such as web sniffer. If forwarding is not absolutely necessary, a normal link is usually the better choice.

Affected webmasters should contact the hijacking webmaster and point them out about the error.

If necessary, the webmaster concerned can contact the operator of the search engine and ask for resumption.

Similar shapes

A similar variant of URL hijacking is mainly used by adult websites. It can happen that people are brought into connection with pornographic material by their name in the search engine Google. The search results usually contain content with the person's name and a specific URL. However, when you click on the website you do not get to the supposed content for this person, but are usually redirected to overview or category pages of the portals.

Individual evidence

↑ http://php.net/header
↑ How does the Yahoo! Web Crawler Redirects?
↑ ^a ^b Thomas Bindl Blog: Hijacking Help
↑ URL hijacking
↑ If your own name suddenly appears on erotic websites ( memento from September 23, 2016 in the Internet Archive ) revolvermaenner.com, April 17, 2013

Web links

[1] ttp://php.net/header

[2] How does the Yahoo! Web Crawler Redirects?

[bindl-3] Thomas Bindl Blog: Hijacking Help

[4] URL hijacking

[5] If your own name suddenly appears on erotic websites ( memento from September 23, 2016 in the Internet Archive ) revolvermaenner.com, April 17, 2013