Soccer Analytics

from Wikipedia, the free encyclopedia

The term soccer analytics refers to data analysis in soccer . This includes in particular the evaluation of various data and statistics for decision-making as well as for forecast models. Mainly competitive games, but also other games or training sessions are analyzed. The analysis process is fundamentally based on the KDD process .
With the increasing availability of technical tools, data analysis is becoming increasingly important in professional sport and is already being used by some clubs for decision-making.

Basics of data analysis

Analog data collection

Even before the widespread use of technology, there were first attempts to annotate football matches and evaluate them statistically. In 1950 Charles Reep started to develop his own system for annotating football matches. In total, he recorded over 2,200 games and invested over 80 hours of time in the analysis for each of these games. His work laid an important foundation for the later development. In addition, he came to several findings, some of which are still used today in football. The interpretation of his data led him to believe, for example, that it is more effective to play the ball long towards the opponent's goal than to complete many short passes. The tactic he backed up with numbers is still widespread today. Another of his analyzes led him to a theory that did not justify pressing, but described it for the first time.
Through his work, Reep laid the foundation for future developments in the field of statistical analysis in football.

Image recognition

Much of the available data is created through the use of video cameras that film the game. The film material is analyzed in real time with the help of software, which results in numerous data and statistics. During the early days of these systems in the 1990s, manual evaluation of the video material for a single game took around 4 hours. This was due to the fact that at that time the evaluation was carried out without technology, but only with pen and paper. In the last few years this process has been increasingly optimized, so that it is now possible to create live game statistics. During the Champions League Final 2010 three employees at Opta Sports were responsible for this, who recorded a total of 2,842 events.

Approaches to automate the collection of game data by video cameras as completely as possible can be found in the field of artificial intelligence . One example of this is a project by the Ruhr University in Bochum , which was funded by the Federal Ministry for Economic Affairs and Energy . The converted concept has relatively low hardware requirements (two full HD cameras, a powerful computer and connection to a SQL - database ), making it easy and inexpensive to install. After the system has been installed and calibrated , the events can be automatically created through video analysis and stored in the database. Two operators are only responsible for assigning the individual players, as the system can only differentiate between the teams and not the players.

Sensor data

Another way to get special information, such as a player's speed or distance, is the use of sensors . These are attached to the player and allow a much more precise measurement of the data than is possible with video cameras. However, the use of this technology has so far mostly not been allowed in professional football at official games. In 2015 FIFA started to approve sensor systems (EPTS - Electronic Performance and Tracking Systems) and in October called on manufacturers to present their systems in order to find a uniform standard.

An example of a system that works with sensors is the "ZXY Sports Tracking System". This has already been approved for use in the Danish first division and UEFA games. For this, players have to wear a sensor belt, which measures position data, acceleration and pulse . A compass also makes it possible to precisely determine the direction in which a player is traveling. The fact that the ZXY system does not work with GPS , but rather wirelessly , leads to both advantages and disadvantages. The biggest disadvantage is the infrastructure requirements . In contrast to GPS systems, appropriate transmission masts have to be installed, which cause high costs and are location-dependent. The advantage over GPS systems is the significantly higher accuracy of position data.
With RedFIR, the Fraunhofer Institute has developed a similar system that also works via radio.

Random aspect

Charles Reep recognized as early as the 1950s that many events in football are strongly influenced by chance. Various studies in recent years have confirmed his idea and made it more concrete. A research project by the University of Augsburg showed that 44.4% of all goals are created by chance. To achieve this, more than 2,300 goals were analyzed and examined with regard to various random aspects (e.g. deflected shot, ricochet from post or crossbar).
Scientists from the University of Münster have also found out that the outcome of a soccer game depends to a very large extent on chance. According to their results, a soccer game can be compared to throwing a dice several times, with a 6 meaning a goal. The number of throws depends on the current fitness of a team and its strength.

Football data provider

The internationally known providers that provide data on official games include:

  • Opta
  • Prozone
  • Match Analysis

In addition, country-specific deltatre (Germany), Infostrada (Netherlands) and StatDNA (USA) are known.
Customers include a large number of clubs in well-known European leagues, numerous media companies and associations. Many clubs are customers of several providers in order to create a well-founded database.

The following list shows some of the known providers as well as known customers. (As of December 2015)

Provider / system Known customers
Opta Sports BBC , CNN , Eurosport , ESPN , Kicker , Sky , ZDF
13 Bundesliga, Arsenal FC , Real Madrid
Bundesliga , DFB , Italian national team
deltatre (VIS.TRACK) ARD , Axel Springer , Sky, Eurosport, ZDF
FC Arsenal, Borussia Dortmund
DFL, FIFA , UEFA , Premier League
Sports radar Sport1 , table football, The Guardian , Fox Sports Networks
Premjer-Liha , Premjer-Liga , Wales Football Association
STATS (prozone) Arsenal FC, Bayern Munich , Chelsea FC , Hamburger SV , Manchester United
English national team , DFB, Premier League, Ligue 1
ChyronHego (TRACAB) Bundesliga, Premier League, La Liga , UEFA, FIFA

According to its own information, Opta generates over 2,000 events per game for 30 competitions and offers less detailed data for over 1,000 competitions worldwide.
In the area of ​​data analysis, SAP is particularly well represented on the German market. With the software solution Sports One, SAP delivers a comprehensive solution for the data-supported management of a club. The focus is primarily on the aspect of sporting analysis.

statistics

Classic statistics

When Opta started compiling game statistics in the 1990s, the data it collected was very simple. At that time only passes, shots and goalkeeper parades were recorded. Statistics are increasingly being used, especially in the media preparation of soccer matches, usually with increasingly advanced visualizations.
Quantitative statistics are often used to justify or refute tactical measures.

New approaches

xGoals

After goal and shot statistics had dominated media coverage for a long time, a new perspective has emerged in this area. Analysts have increasingly begun not only to record the number of scoring opportunities, but also to look at the quality of these. The idea behind the xGoals statistics (short for "Expected Goals") is exactly this. In contrast to the shot accuracy, xGoals does not take into account the ratio of shots to goals, but primarily considers the position and the type of completion (e.g. header after a corner kick ).

The exact calculation of the xGoal values ​​differs depending on the model, but usually takes the following variables into account:

  • Position of degree
  • Type of conclusion (shot / header, type of assist, from the game / after standard situation, penalty, etc.)

For most models, position is the most influential variable . However, there are very different levels of detail when dividing the playing field into sub-areas. Some models only work with 6 zones, others with several dozen. Basically, the probability of a goal (and thus the xGoals value) for a shot inside the penalty area is higher than for a shot outside.

An exemplary xGoals calculation looks like this:

  1. Start value of the calculation: −0.28
  2. Subtract 0.83 if it is a header
  3. Subtract 0.65 if it is a shot from a corner
  4. Add 2.54 if it is a penalty
  5. Add 0.71 if it is a counterattack
  6. Add 0.16 if it is a standard situation
  7. Subtract the value of the shot position (between 0.0 and 2.99)

The goal probability (i.e. xGoals) for a shot from the penalty spot, which resulted from a counter-attack situation, can be calculated as follows:

This is the logarithmic chance for a goal, which is why another calculation step is necessary:

The probability is then calculated from the chance:

The probability that such a shot will result in a goal is therefore 34.75%.

Use in professional sports

Club teams

The best-known clubs behind which a data-supported management is behind are the English second division FC Brentford and the Danish first division FC Midtjylland . Both clubs are largely owned by Matthew Benham.
Benham relies heavily on statistical models and so-called key performance indicators , which are available to the trainer during the game. FC Midtjylland became Danish champions just one year after Benham's entry - for the first time since it was founded in 1999. FC Brentford was also able to benefit from the data-supported decisions and was promoted to the 2nd division in the 2013/14 season. The promotion to the Premier League was narrowly missed in the following season in the semi-finals of the play-offs .
Midtjyllands CEO estimates the advantage of data analysis at 5%, which can be a decisive factor in competitive sports.

Sports betting

Already well before the club teams, betting providers began to use data to place profitable bets.
In football, Matthew Benham has made a name for himself in this area. Benham founded the betting company SmartOdds in 2004 with a focus on football betting. All bets are based on complex probability calculations based on numerous variables. For English professional football, SmartOdds has developed a model that can calculate game predictions for all possible pairings. This model takes into account almost 200 parameters.

Predictive Analytics

Scatter plot with regression line

With the extensive amount of data it is now possible to make predictive predictions in addition to creating statistics. In practice, various approaches from the area of machine learning are used here. Different regression models that show the relationship between one dependent and several independent variables are particularly widespread. Regression analysis is very popular in this area, as it can be carried out very easily with the help of various tools and can make connections visible at a glance in the form of visualizations.
The Poisson distribution also plays an important role because its property (the prediction of rare events, e.g. goals), is ideal for football. Research has shown that the number of goals actually scored corresponds closely to their probability according to the Poisson distribution.

The Poisson distribution is also used by betting providers such as SmartOdds as the basis for their calculations. The (simplified) model of SmartOdds, assuming that the Poisson distribution applies, allows to calculate the probability of an outcome (x goals for the home team, y goals for the away team):

Where and stand for the estimated goals of the home and away team and are calculated as follows:

Here stands for the mean value of the goals scored per game (in the case of the model based on the professional leagues in England). corresponds to the home advantage, which is calculated from the ratio of home goals . describes the offensive and defensive strengths of a team, where i stands for the home team and j for the away team.

Public perception and criticism

The many novel approaches mean that there are numerous critics of data analysis in football. In addition to these subjective opinions, there are also objectively perceptible problem areas.
The critics can be found in different areas. Traditionally inclined coaches are often of the opinion that the emotionality of the game cannot be expressed in data and that intuition plays a crucial role. One example is Tim Sherwood , former Aston Villa and Tottenham Hotspur coach . He particularly criticizes the development in the scouting area, in which increasingly data-based decisions are made. He also strongly criticizes the expected goals model and emphasizes the importance of one's own intuition.

Objective problems in data analysis are mainly found in the psychological area. The process of data analysis, if carried out improperly, harbors the risk that incorrect assumptions will be made or hypotheses will be biased with the aim of confirming them. In psychology one speaks of cognitive distortions . A well-known example in football of this is the widespread theory that a team is particularly prone to conceding a goal if they have recently scored a goal themselves. An analysis of over 100 Premier League games has shown, however, that exactly the opposite is the case - teams that have scored a goal allow the fewest goals to be conceded shortly thereafter.
The confirmation error and the hot hand phenomenon are particularly relevant in this context . The effects mentioned represent all potential sources of error when performing the data analysis. The underlying purpose of data analysis, however, is precisely to avoid these errors by considering the available data impartially. It is particularly important to pursue the goal of gaining knowledge instead of trying to confirm existing hypotheses.

literature

  • Chris Anderson, David Sally: The Numbers Game: Why Everything You Know About Football is Wrong . Penguin Books, London 2014, ISBN 978-0241963623 .
  • Simon Kuper, Stefan Szymanski: Soccernomics . HarperSport, 2012, ISBN 978-0007586523 .
  • Christoph Biermann: The Football Matrix: In search of the perfect game . KiWi-Taschenbuch, 2010, ISBN 978-3462042535 .

Web links

Individual evidence

  1. See Anderson, Sally, pp. 13-19.
  2. a b See Anderson, Sally, p. 10
  3. Marc Schlipsing, Jan Salmen, Christian Igel: real-time video analysis in football . In: AI - Artificial Intelligence . 27, No. 3, 2013, pp. 235-240.
  4. FIFA Requests Player Tracking Tech Companies To Present Their Wearable Systems For Match Play Consideration , October 14, 2015. Retrieved November 23, 2015
  5. Håvard D. Johansen, Svein Arne Pettersen, Pål Halvorsen, Dag Johansen: '' Combining Video and Player Telemetry for Evidence-Based Decisions in Soccer '', p. 2f
  6. RedFIR® - Fraunhofer Institute for Integrated Circuits IIS ( Memento from November 15, 2016 in the Internet Archive ). Retrieved January 6, 2016
  7. Chance worked solidly in Germany at the World Cup - Augsburg sports scientists found 41.8% chance goals . Retrieved December 6, 2015
  8. ^ A. Heuer, C. Müller and O. Rubner: Soccer: is scoring goals a predictable Poissonian process? , March 3, 2014, p. 4. Accessed December 6, 2015
  9. See Anderson, Sally, p. 6
  10. See Anderson, Sally, p. 11
  11. Opta Sports Customers & Partners ( Memento from December 29, 2015 in the Internet Archive ). Retrieved January 6, 2016
  12. deltatre AG deltatre AG - references . Retrieved November 26, 2019
  13. Sportradar Our Partners . Retrieved January 6, 2016
  14. Prozone Testimonials ( Memento from January 6, 2016 in the Internet Archive ). Retrieved January 6, 2016
  15. ChyronHego Tracab Optical Tracking . Retrieved January 6, 2016
  16. ^ Covering the world of football, in the greatest detail ( Memento of March 4, 2016 in the Internet Archive ). Retrieved November 23, 2015
  17. SAP Unveils SAP Sports One Solution for Soccer . Retrieved December 12, 2015
  18. a b Calculating Expected Goals 2.0 , May 8, 2014. Accessed November 23, 2015
  19. Premier League Projections and New Expected Goals , October 19, 2015. Accessed November 23, 2015
  20. You should be eleven files . Retrieved December 12, 2015
  21. The best professional weather in the world . Retrieved December 12, 2015
  22. a b An introduction to football modeling at Smartodds . Retrieved December 12, 2015
  23. See Backhaus, Klaus; Erichson, Bernd; Plinke, Wulff; Women, Rolf. Multivariate Analysis Methods: An Application-Oriented Introduction . Berlin, Heidelberg: Springer-Verlag, 2013, p. XXII
  24. Holger Dambeck: Is football a game of chance? . In: Spectrum of Science . June 7, 2010, pp. 68-70.
  25. Tim Sherwood: It's not seen as sexy to sign players from lower leagues , November 30, 2015. Retrieved December 3, 2015
  26. See Anderson, Sally, p. 22ff