Time series analysis

from Wikipedia, the free encyclopedia
Example of a time series: Random Walk with Trend

The time series analysis is the discipline that deals with the inferential statistical analysis of time series and the prediction ( trends ) of their future development. It is a special form of regression analysis .

Term time series

A time series is a chronologically ordered sequence (but mostly not a series in the mathematical sense) of observations of a certain size. The individual points in time are combined into a set of observation points in time at which there is exactly one observation for each point in time . Time series occur in all areas of science. Typical examples of time series are stock exchange prices, voting intent surveys or weather observations .

Time series: more detailed definition, classification and examples

The term time series assumes that data are not generated continuously, but rather discretely but at finite time intervals. A time series can be obtained from a time-continuous measurement signal (or the continuous recording of a measurement signal, for example with an analog ty recorder or an analog magnetic tape recorder) by scanning .

The times to which data points are assigned can be arranged equidistantly , that is to say at constant intervals (for example every 5 seconds), with a different regularity (for example every working day) or irregularly. A data point can consist of a single number ( scalar values, univariate time series) or of a plurality ( tuples ) of numerical values ​​( vector values , multivariate time series). However, all data points must be made up of individual values ​​in the same way. Typical time series arise from the interaction of regular and random causes. The regular causes can vary periodically (seasonally) and / or contain long-term trends. Random influences are often referred to as noise .

A -dimensional vector of random variables with an associated multivariate distribution is given . This can also be understood as a sequence of random variables or as a stochastic process . A sample of this gives the T real numbers as a possible result . Even with an infinitely long observation, there would only be a single realization of the stochastic process. However, such a process not only has one implementation, but generally any number with the same statistical properties. A time series is defined as a realization of the data generating process. Instead of describing stochastic processes of dimension T based on their T-dimensional distribution function , it can be captured by the moments of the first and second order, i.e. by

Expected values :
Variances :
Covariances :

One also speaks of auto- covariances, since they are covariances of the same process. In the special case of the multidimensional normal distribution of the stochastic process, it is true that it is uniquely determined by the moments of the first and second order. For the statistical inference with time series, assumptions have to be made, since in practice there is usually only one implementation of the process generating the time series. The assumption of ergodicity means that sample moments that are obtained from a finite time series converge for quasi against the moments of the population .

Time series occur in many areas:

A particularly complex (but also rich) data situation exists when one has time-dependent microdata , i.e. personal or household data for different points in time. Here, however, one no longer speaks of time series data, but of trend , panel or event data , depending on their time structure.

Time series analysis: overview

The goals of the time series analysis can be

  • the shortest possible description of a historical time series
  • predicting future time series values ( forecast ) on the basis of knowledge of their previous values ( Weather )
  • the detection of changes in time series ( EEG or EKG - monitoring in medicine during surgical interventions, changes in the global vegetation phenology due to anthropogenic climate changes )
  • the elimination of serial or seasonal dependencies or trends in time series ( seasonal adjustment ) in order to reliably estimate simple parameters such as mean values

The procedure for the time series analysis can be divided into the following working phases:

  • Identification phase: Identification of a suitable model for modeling the time series
  • Estimation phase: estimation of suitable parameters for the selected model
  • Diagnostic phase: diagnosis and evaluation of the estimated model
  • Deployment phase: deployment of the model that has been estimated and found to be suitable (especially for forecasting purposes)

There are differences in the individual phases, depending on whether one uses linear models for time series analysis ( Box-Jenkins method , component model ) or non-linear models. The Box-Jenkins method is discussed below as an example.

Identification phase

The graphical representation of the empirical time series values should come first . This is the easiest and most intuitive method. As part of the graphic analysis, initial conclusions can be drawn about the existence of trends, seasonality, outliers, variance stationarity and other abnormalities. If a stochastic trend ( unsteadiness ) is determined (either by graphical analysis or by a statistical test such as the extended Dickey-Fuller test ( English augmented Dickey-Fuller test , ADF test for short )), which is later determined by a transformation of the time series ( Differentiation) is to be corrected, a variance stabilization (for example Box-Cox transformation ) is recommended. The variance stabilization is important because after differentiating a time series, negative values ​​can appear in the transformed time series.

Before you can continue working, the fundamental question must be clarified whether the time series should be mapped in a deterministic model ( trend model ) or a stochastic model. These two alternatives imply different methods of trend adjustment. In the trend model, adjustment is made using a regression estimate, in the stochastic model by means of difference formation.

Estimation phase

In the estimation phase, the model parameters and coefficients are estimated using different techniques. The least-squares estimation is suitable for the trend model, the moment method , the non-linear least-squares estimation and the maximum likelihood method are suitable for the models within the framework of the Box-Jenkins approach .

Diagnostic phase

In the diagnosis phase, the model or, if applicable, several selected models are assessed with regard to their quality. The following procedure is recommended:

Step 1: Check whether the estimated coefficients are significantly different from zero. In the case of individual coefficients, this is done with the help of a t test ; several coefficients are examined together with an F test .

Step 2: If the Box-Jenkins method is used, it must be checked to what extent the empirical autocorrelation coefficients agree with those that should theoretically result from the previously estimated coefficients. In addition, the partial autocorrelation coefficients and the spectrum can be analyzed.

3rd step: Finally, there is a careful analysis of the residuals. The residuals should no longer have any structure. The centering of the residuals can be checked with a t test. The constancy of the variance can be calculated visually on the time series graph or by calculating the effect of different λ values ​​in a Box-Cox transformation . In order to check that the residuals are autocorrelated , each individual coefficient can be checked for a significant difference to zero or the first coefficients can be tested together for significance to zero. The so-called Portmanteau tests can be used to clarify the latter . Information criteria are useful for this, for example .

Deployment phase

In the deployment phase, it is necessary to formulate a prediction equation from the model equation established in the identification phase and found to be useful . An optimality criterion must be defined beforehand. For it can minimum mean square deviation ( English minimum mean squared error , short MMSE ) are taken.

Methods of time series analysis

Figure 1: Time series analysis method

The progression patterns of time series can be broken down into different components ( component breakdown ). There are systematic or quasi-systematic components. These include the trend component as a general basic direction of the time series, the seasonal component as a cyclical movement within a year, the cycle component (also called business cycle in economic time series ) with a period length of more than one year, and a calendar component that is due to calendar irregularities. A residual or irregular component occurs as a further component . This includes outliers and structural breaks that can be explained by historical events, as well as random fluctuations, the causes of which cannot be identified in detail.

The components mentioned cannot be observed directly. Rather, they arise from human imagination. So the question arises of how to model these components.

Traditional approaches consider random fluctuations to be structure-neutral and view the systematic components as deterministic functions of time,

.

In more recent approaches, random fluctuations have a dominant role in the modeling of the systematic component. The time series is modeled by a stochastic process , such as an MA (1) process:

.

Here is the time index and a random variable for which the property white noise can be assumed. Chaos theory (see dimensionality ) represents a contrary approach to time series modeling .

There are some general mathematical tools available in time series analysis, such as transformation ( Box-Cox transformation ), aggregation , regression , filtering, and moving averages . In the following it is assumed that the time series can be modeled as a stochastic process. This approach is also known as the Box-Jenkins method . There are other special methods and instruments for stochastic processes. These include:

Inferential analysis of time series

In inferential statistics , the size of the examined effects is estimated on the basis of samples . In addition to the methods already mentioned, in which the errors of the results found are estimated using inferential statistics, complex time series models can be specified and estimated. This is mainly used in econometrics for economic models . The basis is the concept of the stochastic process ; The group of ARMA processes should be mentioned here in particular .

Ordinal time series analysis

The ordinal time series analysis is a relatively new one for the qualitative investigation of long and complex time series. Instead of the values ​​of a time series, the order relation between the values, i.e. the up and down, is described. For this purpose, the time series is transformed into ordinal patterns and then the distribution of these patterns is statistically analyzed in order to measure the complexity or the information content of the underlying time series. A well-known complexity parameter is the permutation entropy , introduced in 2002 by Bandt and Pompe.

Neural networks and the processing of time series

If you deal with artificial neural networks , you can see that the modeling process is very similar to the ARIMA model. Usually only the terminology is different. To forecast a time series with a multilayer perceptron, a sliding time window with n values ​​from the past is placed over the time series. The training task consists in inferring the next value from n values ​​in the input layer. The training takes place on the basis of the known values ​​to predict their future, so to speak from within. As a rule, however, it is external influences from a (chaotic) dynamic system that influence the course of a time series (observable values ​​of the dynamic system). In order to include external influences in the model, additional neurons can be entered in the input layer of the multilayer perceptron. These must also be available in the form of a time series.

See also

literature

  • Walter Assenmacher: Introduction to Econometrics. 6th edition. Oldenbourg, Munich 2002, ISBN 3-486-25429-4 .
  • Christoph Bandt & Bernd Pompe. (2002). Permutation Entropy: A Natural Complexity Measure for Time Series . In: Physical Review Letters . 88. 174102. doi: 10.1103 / PhysRevLett.88.174102
  • Walter Enders: Applied Economic Time Series . Wiley, Hoboken 2003, ISBN 0-471-23065-0 .
  • James D. Hamilton: Time Series Analysis . Princeton University Press, Princeton, 1994, ISBN 0-691-04289-6 .
  • Helmut Lütkepohl : New Introduction to Multiple Time Series Analysis . Springer-Verlag, Berlin, 2005, ISBN 978-3-540-40172-8 .
  • Klaus Neusser: Time series analysis in economics. 3. Edition. Vieweg + Teubner, Wiesbaden 2011, ISBN 3-8348-1846-1 .
  • Horst Rinne, Katja Specht: time series. Statistical modeling, estimation and forecasting . Vahlen, Munich 2002, ISBN 3-8006-2877-5 .
  • Rainer Schlittgen , Bernd Streitberg: Time series analysis . 9th edition. Oldenbourg, Munich 2001, ISBN 3-486-25725-0 .
  • Elmar Steurer: Forecast of 15 DGOR time series with neural networks . In: Operations Research Spectrum . 18 (2), pp. 117-125. doi: 10.1007 / BF01539737
  • Helmut Thome: time series analysis. An introduction for social scientists and historians . Oldenbourg, Munich 2005, ISBN 3-486-57871-5 .
  • Ruey S. Tsay: Analysis of Financial Time Series . Wiley, Hoboken 2005, ISBN 0-471-69074-0 .

Software for performing time series analysis

A time series analysis can be carried out with the free software packages R , gretl , OpenNN and RapidMiner . Proprietary solutions include the software packages BOARD , Dataplore , EViews , Limdep , RATS , SPSS , Stata , SAS and WinRATS.

Web links

Individual evidence

  1. ^ Rainer Schlittgen , Bernd Streitberg: Time series analysis. Oldenbourg Verlag, 2001., ISBN 978-3-486-71096-0 (accessed from De Gruyter Online). P. 1
  2. Dieter Meiller, Christian Schieder: Applied Machine learning: Predicting behavior of industrial units from climate data In: Abraham AP, Roth, J. & Peng, GC (Eds.): Multi Conference on Computer Science and Information Systems 2018, IADIS Press, Pp. 66-72, ISBN 978-989-8533-80-7