Web analytics

from Wikipedia, the free encyclopedia

The articles visitor counter , Web_Analytics and tracking pixels overlap thematically. Help me to better differentiate or merge the articles (→  instructions ) . To do this, take part in the relevant redundancy discussion . Please remove this module only after the redundancy has been completely processed and do not forget to include the relevant entry on the redundancy discussion page{{ Done | 1 = ~~~~}}to mark. ChristophLZA ( discussion ) 09:29, Oct. 29, 2015 (CET)

Web analytics (also clickstream analysis , data traffic analysis , traffic analysis , web analysis , web controlling , web tracking ) is the collection of data and its evaluation with regard to the behavior of visitors to websites. An analytic tool, also known as a tracking tool, typically examines where the visitors come from, which areas on a website are visited and how often and for how long which subpages and categories are viewed. In Germany, the use of such tools is controversial for reasons of data protection .

It is mainly used to optimize the website and to better achieve website objectives (e.g. frequency of visits, increase in page views, orders, newsletter subscriptions). In the case of web analytics, a fundamental distinction can be made between evaluation methods for permanent measurement of site effectiveness and methods for finding weak points in the site and opportunities for improvement (see Methods). In addition to a number of free products, around 150 companies offer solutions for web analytics.

term

While the term web analytics has largely gained acceptance globally, the term "web controlling" is often used as a synonym in Germany. The older term “log file analysis” shares an overlap with web analytics (analysis of web server log files ), but log file analysis can also mean the analysis of other log files. Like "Web Controlling", the designation "Page Impression" (PI), which differs from the internationally used "Page View" (PV), has established itself in Germany. What is meant in any case is the retrieval of a page of an Internet offer by a human visitor (not a crawler ). Several individual page views are combined into one session (visit). A visitor (unique user) can visit a website in several sessions (e.g. because: the site is so large and he has so little time at a time).

aims

Web analytics is used to analyze, optimize and control processes relating to all internet activities of a company. Web controlling tools make it possible to usefully measure a large number of key figures and evaluations relating to a website and the associated marketing campaigns. Important key figures in electronic trading relate, for example, to:

  • the effectiveness of individual advertising media (e.g. banners , newsletters )
  • the number of visitors to your web shop
  • the percentage of visitors that something in the basket put
  • the percentage of visitors who complete the buying process
  • the average shopping cart value
  • the time until the purchase in the web shop
  • the search terms with which the web shop was found that also led to the purchase

The aim is now to evaluate these key figures and the statistics generated from them (statistics on ROI , shopping carts , conversion rates , online sales) and, based on these results, to start new marketing campaigns or to adapt existing ones and to optimize your own website accordingly.

Data collection procedures

Usually, for web analytics, either the log files of the web server are evaluated or certain tags are used in websites for data acquisition. In addition to these two methods, there are also methods that use web server plugins or network sniffers.

Server-based data: log file analysis

The software used to run websites, a so-called web server such as Apache or MS IIS , usually produces a continuous log of all software activities. Initially, the main purpose of this was to record and rectify operating errors, but the company quickly discovered the possibility of using this log file to collect results on the popularity of the website, the frequency of page views and the activity of website visitors. Since these logs are created directly by your own software, they show a true representation of the server activities. To evaluate the logs - pure text files that reproduce the activities of the software line by line - software is usually used that creates statistics, assigns data and makes them clear in graphics and tables.

Client-based data: tags and pixels

Another, simple method of collecting data has been in existence since around 1996: invisible mini-images (1-pixel graphics), so-called counting or tracking pixels , are integrated directly into the source code of the website itself . A call of this graphic can now stand for exactly one page call. The pixel file does not need to be on the same server as the actual content of the website. ASP service providers can take over the collection, storage and analysis of the data. In addition to the 1-pixel images that are still in use, almost all solutions now also use JavaScript code for data collection. These "JavaScript tags" are also integrated into the source code of the page, but can collect additional information about the requesting client (usually the browser) - e.g. B. graphic resolution of the monitor used, color depth, plug-ins installed in the browser, etc. Newer tools also allow the recording of mouse movements ( mouse tracking ) or keyboard inputs by website visitors. The emergence of client-based web tracking was visualized as a network graph and a six-fold increase in the years 2005–2015 was determined.

Further procedures

For the network protocol analysis (NPA, Network Sniffer), a special decoder is used between the own web server and the connection to the Internet. This now collects the entire data traffic in this network. With URL rewriting, a proxy is installed between the web server and the Internet, which saves the traffic data in special log files and at the same time writes additional information (session IDs) to the URL. So-called hybrid processes process more than one data source at the same time. The integrated evaluation of tag data and server data in particular represents a particularly rich and reliable, but also complex type of data traffic analysis.

Cookies

In order to be able to assign a single page view of a session and a session to a possibly returning visitor, cookies are usually used. There are heated debates on this subject, for example, refer to the HTTP cookie page . It should be noted that for a professional data traffic analysis that puts the 'visitor' (not the individual session) at the center of your attention, cookies are currently indispensable.

Flash cookies

Alternative methods with Flash objects, so-called Flash Cookies (or Local Shared Objects, LSOs for short) seem to only exist in a niche, but this - in contrast to classic cookies - can also be used to recognize visitors when they use different browsers.

Canvas fingerprinting

Canvas fingerprinting is more effective than cookies, which can be suppressed manually by any user or through a browser setting . This is a collective term for a number of user tracking techniques for the unique identification of online users.

Day vs. Log file - advantages and disadvantages

Data traffic analyzes have to struggle with strong distortions in the database. No type of analysis can claim to truthfully map the actual traffic on a website.

server-based client based
Page requests that are served from proxies, buffer stores or similar cannot be registered by the server Distortions caused by buffer storage can be avoided
the usual way of assigning sessions (same IP address for max. 30 min.) is highly error-prone By using cookies, the session and visitor allocation can be increased to an extent that is acceptable for further processing
the usual way of assigning visitors (IP address, possibly user agent ) is in no way reliable Visitors who block JavaScript and / or images in their browser will not be recognized
Data is and will remain in the company itself Cookie blockages and erasure rates create uncertainties
the data format is open, data can be evaluated by various analysis tools the data format is proprietary, i. H. Data can hardly be taken from provider A to provider B when switching
all spiders, bots, etc. are registered in the log files Spiders and bots are only partially registered - this reduces the amount of data that arises, but is an obstacle for search engine optimization
the error messages from the server are registered immediately not all error messages are recognized
JavaScript allows richer data to be collected about clients
Different locations, server clusters, domains and subdomains or the like are not a major problem

In summary, it can be stated that the client-side method in combination with 1-party cookies is the most common and - pragmatically speaking - the best today. The data quality is strong enough to be able to make reliable decisions on this basis. The costs for an equally reliable system based on log files (and cookies) are usually significantly higher and are most likely to be considered where the data contains sensitive information.

Methods

Click path analysis with originating pages (left), movement arrows quantified by their thickness and website objects, the area of ​​which symbolizes the number of visitors and the length-width ratio of the proportion of further outgoing calls.

In general, two areas of application of web analytics can be distinguished:

1. Regular monitoring of the effectiveness of the website and associated campaigns

By defining key figures (e.g. costs, sales, conversion rate, PageViews per session, sessions per visitor), the individual data of the analyzes can be condensed into meaningful information and placed in a context of comparability: Development of sales over the year, Costs per campaign, conversion rate compared to a set goal, etc.

2. Strategies for optimizing the website

  • Path analyzes help to find particularly popular and unpopular pages in a website
  • Segmentation helps to find and further differentiate certain visitor groups (e.g. visitors from search engine A compared to visitors from search engine B)
  • Conversion paths (funnels) help to measure and optimize defined, important page sequences in the website
  • Optimization of start or landing pages by quickly trying out small improvements and changes ( A / B tests )

software

A web statistics provides an evaluation of the surfing behavior of website visitors. Here are page views and unique hits (Engl. Visit ) evaluated the behavior of visitors to analyze websites. Essentially, web statistics are based on an evaluation of log files (see log file analysis ) of the server, but there are also other techniques (recording by counting pixels or with a JavaScript code that transfers the counting pulse to a counting server).

In addition to the often graphically prepared statistics over freely selectable evaluation periods, web statistics present further information (technology of the visitors, origin, special functions, behavior of the visitors on the website) that go beyond the content of the log files and e.g. B. can be found under the term "web tracking". However, web statistics usually reach their limits when the visitor leaves the Internet and establishes a telephone contact. This so-called “media break” has now been overcome by so-called telephone tracking .

Countermeasures

Users of the World Wide Web can at least partially protect themselves against actual or alleged spying through web analytics. In addition to corresponding privacy settings in the browser, browser add-ons such as advertising or tracker blockers are particularly common. Another method, which is associated with a loss of speed, is the use of alternative proxy networks to obfuscate your own IP address .

Legal admissibility in Germany

Legal regulation

The legal situation regarding the use of analytic tools is currently controversial in Germany. The legal impetus for criticism is often the saving of the IP address and the use of cookies. The Federal Data Protection Act only permits the collection and storage of personal data if this is explicitly permitted by a statutory provision or if the user has given clear and prior consent . The regulation in Section 15 of the Telemedia Act (TMG) is relevant. Accordingly, personal data from visitors to a website may only be collected and used without the consent of the user insofar as this is necessary to enable the use of commercial offers on the Internet and to settle them. The use of this data after the end of the usage process is only permitted if the data is “necessary for billing purposes with the user”. According to Section 13 of the Telemedia Act (TMG), providers of Internet portals must ensure that "the personal data that arise about the access or other use is deleted immediately after it has ended".

Pseudonymous usage profiles

Service providers are allowed for the purposes of advertising (e.g. billing ad clicks), market research (e.g. collecting user interest in order to subsequently optimize websites) or for the needs-based design of telemedia (user designs e.g. the retrieval of a TV Program website according to his taste: preferred genre, arrangement of the channels) Create usage profiles, provided the user does not object ( Section 15 (3) TMG). Such a usage profile can e.g. B. contain information about the time of the page visit and the page visited, but may not contain any identifying features such as the IP address, only a pseudonym. The profile may not be merged with other data about the bearer of the pseudonym (e.g. in the context of geolocation of the user). So that the user can exercise his right of objection, he must be informed of his right of objection at the latest at the beginning of the page visit. The data protection supervisory authorities consider a link under the name “Data Protection”, which offers information and the possibility of objection, to be free of objection.

Personal reference from IP addresses

It is controversial whether the IP address of an Internet user in connection with the time of use represents a personal date. The Munich District Court rejected the personal reference of a dynamic IP address that had been stored by the operator of an Internet service in a supplementary notice (obiter dictum) (judgment of September 30, 2008 - 133 C 5677/08, MMR 2008, 860). This was followed by some legal commentators (Gola / Schomerus, § 3 Rn. 10) who consider IP addresses to be “relatively” personal, so that the storage of IP addresses by content providers is permitted and only their transmission is not permitted.

In contrast, the Berlin-Mitte District Court assumed a personal reference and forbade the operator of an Internet portal to store the IP addresses of its users beyond the duration of the usage process (ruling of March 27, 2007 - 5 C 314/06, DuD 2007, 856-858, confirmed by the Berlin Regional Court, judgment of September 6, 2007 - 23 S 3/07, MMR 2007, 799-800). With the help of other data, such as those stored by Internet access providers, it is possible to determine the Internet connection used and its owner. The Wiesbaden Administrative Court agreed (decision of February 27, 2009 - 6 K 1045/08, MMR 2009, 428–432). The Wuppertal District Court also regards the IP address as personal data. In other European countries, the Swiss Federal Administrative Court, the Swedish Supreme Administrative Court and the French Constitutional Court have affirmed the personal reference of IP addresses with reference to the European data protection directive 95/46 / EC, which also applies in Germany. The Federal Ministry of Justice, the Federal Data Protection Officer, the data protection officers of the federal and state governments and the data protection officers of all EU states are of the same opinion. The German Federal Court of Justice has not yet had to decide on the question, but in a decision from 2009 recognized the “Internet user's right to anonymity”.

Clarification of compliance with data protection required

Anyone who, as a German provider of an Internet service, integrates external services such as web analysis services into their offer is liable for compliance with German data protection law (so-called order data processing). In the case of American companies operating in Germany in particular, the user of an analytics tool should ensure that the foreign company complies with German data protection law, as there are no data protection regulations in the USA that are comparable to German law (cf. §§ 11, 4b (2) and 3 BDSG). The storage or transmission of personal data is only permitted with the consent of the internet user (§§ 4, 4a BDSG). Consent must be given “consciously” (§ 13 II TMG) ​​and must not violate § 307 II BGB. It is of the opinion that it is an essential basic idea of ​​the Telemedia Act to protect the user from a suspicious logging of his usage behavior. Deviating consent clauses are therefore ineffective according to § 307 II BGB.

The Telemedia Act in Germany only allows the processing of personal data in accordance with Section 12 I TMG if the user has given his prior consent or if there is a legal authorization. However, when an external tool is used, the full IP address (a personal data?) Of the site visitor is usually transmitted to a third party (service provider). Unless the user has given their prior consent, this is not permitted, as there is no apparent legal basis that should allow this.

On 26./27. November 2009 the supreme supervisory authorities for data protection in the non-public area (Düsseldorfer Kreis) decided on some principles for handling Google Analytics and other web tracking processes. In the opinion of the supervisory authorities, personal data of a user may only be collected and used without consent insofar as this is necessary in order to enable the use of telemedia and billing. The analysis of usage behavior using complete IP addresses (including geolocation) is only permitted with conscious, unambiguous consent due to the fact that this data can be related to individuals. If there is no such consent, the IP address should be shortened before any evaluation so that it cannot be linked to a person. As already explained in the previous section, the personal reference of IP addresses is still not legally or even ultimately clarified.

Fines

The data protection supervisory authorities can impose fines on providers who violate the above-mentioned data protection laws.

The Berlin data protection officer has now set up a special fines department in his authority in order to be able to impose more sanctions in the future. Image blogger Stefan Niggemeier has already been banned from logging the IP addresses of the users of his website.

See also

literature

  • Jim Sterne: Web Metrics: Proven Methods for Measuring Web Site Success . Wiley & Sons, 2002, ISBN 0-471-22072-8 . (English)
  • Eric T. Peterson: Web Analytics Demystified . 2004, ISBN 0-9743584-2-8 . (English)
  • Avinash Kaushik: Web Analytics: An Hour a Day . Sybex, 2007, ISBN 978-0-470-13065-0 . (English)
  • Jason Burby, Shane Atchison: Actionable Web Analytics: Using Data to Make Smart Business Decisions . Sybex, 2007, ISBN 978-0-470-12474-1 . (English)
  • Frank Reese: Web Analytics - Turning traffic into sales: The best tools and strategies . Businessvillage Verlag, 2008, ISBN 978-3-938358-71-9 .
  • Marco Hassler: Web Analytics - evaluate metrics, understand visitor behavior, optimize website . Mitp-Verlag, 2008, ISBN 978-3-8266-5931-7 .
  • Avinash Kaushik: Web Analytics 2.0 - The Art of Online Accountability and Science of Customer Centricity. 2009, ISBN 978-0-470-52939-3 . (English)
  • Ralf Haberich: FUTURE DIGITAL BUSINESS - How web analytics and business intelligence influence online marketing and conversion . Mitp-Verlag, 2012, ISBN 978-3-8266-9233-8 .

Web links

Individual evidence

  1. Aims of web controlling and further information on online trading
  2. Analytics tracking. In: Google Analytics . Retrieved January 19, 2019 .
  3. Michel Rossier: Track mouse movements ( Memento from July 16, 2007 in the Internet Archive ) . July 15, 2007.
  4. WebTracking.org - Past / Present / Future. Accessed May 2, 2016 .
  5. ↑ Block fingerprinting with HTML5 canvas elements. In: Privacy manual. Retrieved January 19, 2019 .
  6. heise online: WWW: Tracking methods are becoming more brutal, browser manufacturers are looking the other way. February 19, 2018, accessed January 19, 2019 .
  7. ^ Felix Barth: Legal Risks When Using Google Analytics & Co. In: akademie.de. August 25, 2009. Retrieved July 25, 2012 .
  8. http://www.telemedicus.info/urteile/Datenschutzrecht/Tracking-von-IP-Adressen/524-AG-Muenchen-Az-133-C-567708-IP-Adresse-ist-kein-habens-Datum.html
  9. Closer http://www.daten-speicherung.de/?p=213
  10. http://www.daten-speicherung.de/?p=197
  11. http://www.vorratsdatenspeicherung.de/content/view/301/79/lang,de/
  12. http://www.jurpc.de/rechtspr/20080110.htm
  13. Archive link ( Memento from September 16, 2009 in the Internet Archive )
  14. http://www.daten-speicherung.de/?p=1218
  15. - ( Memento of January 28, 2011 in the Internet Archive ) Paragraph 27
  16. http://daten-speicherung.de/index.php/bundesjustizministerium-surfprotokollierung-durch-webseitenbetreiber-illegal/
  17. http://daten-speicherung.de/index.php/bundesjustizministerium-surfprotokollierung-durch-webseitenbetreiber-illegal/
  18. - ( Memento of December 22, 2009 in the Internet Archive ) - ( Memento of December 22, 2009 in the Internet Archive ) - ( Memento of November 23, 2009 in the Internet Archive )
  19. http://dejure.org/dienste/vernetzung/rechtsprechung?Text=VI%20ZR%20196/08
  20. Archived copy ( Memento of October 11, 2007 in the Internet Archive )
  21. a b No clarity yet regarding the personal reference of IP addresses " https://www.datenschutzbeauftragter-info.de/lösungen-bei-ip-adressen-klarheit-durch-neues- Judgment "
  22. Archive link ( Memento from May 23, 2012 in the Internet Archive ) see http://www.daten-speicherung.de/?p=1870
  23. http://www.daten-speicherung.de/?p=1736