Referrer spam

from Wikipedia, the free encyclopedia
Successful referrer spam immersed in the issues of the analysis program Webalizer on

Referrer spam (also known as log file spam) is a special form of search engine spamming . Here, websites are accessed en masse so that they appear in the referrer information of the statistics of the attacked websites.

background

Many search engines give a website a good position if many links point to that page. In addition, many websites evaluate the referrers , for example to analyze where the users come from. This is usually done using the log file analysis . If these are shown online - which is particularly popular with weblogs (see backlink ) - it is interesting for spammers to perpetuate themselves in these referrer lists, since it is assumed that these web statistics are read out by web crawlers and used for ranking in search queries become.

damage

This form of spamming damages the website operator in two ways. On the one hand, the relevant information for the evaluation of the log files is falsified in this way and, on the other hand, additional data traffic is generated. On the part of the search engine operator, damage occurs with regard to the resultant falsified search results.

Legal consideration

In the case of commercially operated sites, it can be assumed that this form of spamming, which endangers the accessibility of the server, may interfere with the rights of the established and operated business . Theoretically, one could construct a private law claim for private pages from the self-presentation on a website and understand it as an expression of general personal rights. Matters relevant to criminal law arise in the same way as spam . The question that arises in this context as to whether referrer spam is advertising at all must be answered in the affirmative, at least with regard to published log file analyzes and the resulting improved search engine rankings, and in some cases beyond.

Defense Mechanisms

Nofollow

A simple, albeit only partially effective, solution would be to use the rel = " nofollow " attribute, which means that such references cannot be used to calculate the PageRank . That this has no effect on the behavior of spammers and does not reduce their number has now been proven.

.htaccess

One possibility to put a stop to referrer spam would be a bad word list using RewriteCond in a .htaccess file that sends the status 403 (access forbidden) if a corresponding word appears in a referrer.

RewriteEngine on
RewriteCond %{HTTP_REFERER} casino [OR]
RewriteCond %{HTTP_REFERER} poker
RewriteRule .* - [forbidden,last]

Alternatively, you can limit the problem with the SetEnvIfNoCase .

SetEnvIfNoCase User-Agent „IzyNews/1.0“ leecher=yes
SetEnvIfNoCase Referer izynews.de leecher=yes
order deny,allow
deny from env=leecher

The problem with this is that you have to manually add to the bad word list. An extended approach would be to use a web-based script language to note the referrers and to evaluate how often referrers occur within a certain time. If the access from a certain page exceeds the specified level, the referrer is automatically entered in .htaccess and the log file is cleaned up using a cron job . In this regard, it is difficult to determine that increased data traffic is desired from a certain side. The Apache module mod_evasive takes a similar approach .

NGINX

When using NGINX, it is also possible to control access via the configuration.

server {
   location / {
       if ($http_referer ~* (url1.tld|url2.tld|url3.tld|spamkeyword) ) {
           return 405;
       }
   }
}

Thus, the request is blocked on the server side and not logged in the log files. "url1.tld" stands for the known domain that generates the referrer spam.

Google Analytics

In some cases, access to tracking tools such as Google Analytics is recorded even though no crawler, bot or real user has visited the site. Thus, no entry can be seen in the server log, but in Google Analytics. Occasionally the tracking codes of your own site are placed on other websites in order to appear in the webmaster's statistics. These spam referrals must be filtered for a clean statistical analysis. In Google Analytics it is possible to deactivate these calls in the "Settings of the data view". "Exclude all hits from known bots and spiders" must be activated.

However, not all bots and spiders are known to Google. An individual reference list can be filtered with a regular expression using a filter in the data view.

(?:([^. ]+)\.)?(?:([^.]+)\.)?(domain1|domain2|domain3)\.(com?|de|net)

This can counteract spam in Google Analytics.

Combination of methods

A combination of the filters described above and an adaptation of the .htaccess file can be useful, as this enables spam to be completely excluded in Google Analytics in the long term.

Report

The search engine operators have often set appropriate boundary conditions in which purchased links and other undesirable methods are specified as exclusion criteria from the index. It can therefore help the reporter to report the spam origin domains to the search engine operators with the corresponding log extracts as evidence, because they can be removed from the index if several complaints / reports are received from different sources. The “advertising strategy” is likely to boomerang for the spam bot operators and spammer domains, because the exact opposite of the intended effect occurs. The ranking and the list positions do not increase, but the domains are banned from the hit lists.

Further approaches

In addition, there are other approaches that prevent spam with the help of a PHP script built into the corresponding website.

swell

  1. compare the basic principle of the PageRank algorithm
  2. a b Arne Trautmann - Legal claims after referer spam?
  3. et al. Web Spam, Propaganda and Trust (English) ( Memento of March 13, 2005 in the Internet Archive )
  4. Apache module mod_rewrite
  5. Jörg Kruse - Referer Spam (II)
  6. Apache Module mod_setenvif
  7. Remove Referrer Spam from Google Analytics . Retrieved July 16, 2015.
  8. Google Analytics Spam - How do you get rid of it? Detailed Guide to Eliminate Google Analytics Spam - Retrieved July 23, 2015
  9. Bot-Trap.de - Voluntary project against web spam