Nuclear regression

Under kernel regression ( English kernel regression , hence kernel regression ) refers to a series of non-parametric statistical methods , in which the function of a random variable output data by means of core density estimation is estimated. In contrast to linear regression, the type of dependency, represented by the regression curve, is not defined as linear. The advantage is a better adaptation to the data in the case of non-linear relationships. Depending on whether the initial data itself is random or not, a distinction is made between random design and fixed design approaches. The basic procedure was proposed independently in 1964 by Geoffrey Watson and Elisbar Nadaraia (English transcription : Elizbar Nadaraya).

One-dimensional nuclear regression

Kernel density estimator

Dot plot , histogram, and kernel density estimate of the LSTAT variable from the Boston Housing dataset .

A kernel density estimator for bandwidth is an estimate of the unknown density function of a variable. Is a sample , a core, the core density estimate is defined as: ${\ displaystyle {\ hat {f}}}$ ${\ displaystyle h> 0}$ ${\ displaystyle f}$ ${\ displaystyle x_ {1}, \ ldots, x_ {n}}$ ${\ displaystyle K}$

{\ displaystyle {\ hat {f}} (x) = {\ frac {1} {n}} \ sum _ {j = 1} ^ {n} K_ {h} (x-x_ {j}) = { \ frac {1} {nh}} \ sum _ {j = 1} ^ {n} K \ left ({\ frac {x-x_ {j}} {h}} \ right)}

.

As the graphic on the right shows, the choice of bandwidth is decisive for the quality of the approximation. ${\ displaystyle h}$

Typical kernels with
unlimited carrier		carrier ${\ displaystyle [-1; 1]}$
core	${\ displaystyle K (u) \,}$	core	${\ displaystyle K (u) I (\| u \| \ leq 1)}$
Gaussian kernel	${\ displaystyle {\ tfrac {1} {\ sqrt {2 \ pi}}} \ exp (- {\ tfrac {1} {2}} u ^ {2})}$	Uniform distribution or rectangular core	${\ displaystyle {\ tfrac {1} {2}}}$
Cauchy core	${\ displaystyle {\ tfrac {1} {\ pi (1 + u ^ {2})}}}$	Triangle core	${\ displaystyle (1- \| u \|)}$
Picard core	${\ displaystyle {\ tfrac {1} {2}} \ exp (- \| u \|)}$	Cosine kernel	${\ displaystyle {\ tfrac {\ pi} {4}} \ cos ({\ tfrac {\ pi} {2}} u)}$
		Epanechnikov core (p = 1) quartic core (p = 2) triweight core (p = 3)	${\ displaystyle C_ {p} (1-u ^ {2}) ^ {p}}$ ${\ displaystyle C_ {p} = 3/4}$ ${\ displaystyle C_ {p} = 15/16}$ ${\ displaystyle C_ {p} = 35/32}$

Nadaraya-Watson estimator

Linear regression (black) and Nadaraya-Watson estimators with different bandwidths (red: medium, green: large and blue: small)

The Nadaraya-Watson estimator estimates an unknown regression function from the observation data as ${\ displaystyle m (x)}$ ${\ displaystyle (x_ {1}, y_ {1}), \ dots, (x_ {n}, y_ {n})}$

{\ displaystyle {\ hat {m}} (x) = {\ frac {\ sum _ {i = 1} ^ {n} y_ {i} K_ {h} (x-x_ {i})} {\ sum _ {i = 1} ^ {n} K_ {h} (x-x_ {i})}}}

with and a core and a bandwidth . The function is a function that assigns observations close to a large weight and observations far from a small weight. The bandwidth defines in which area the observations are of great importance. ${\ displaystyle K_ {h} (u) = 1 / hK (u / h)}$ ${\ displaystyle K}$ ${\ displaystyle h> 0}$ ${\ displaystyle K_ {h}}$ ${\ displaystyle x}$ ${\ displaystyle x}$ ${\ displaystyle x}$

While the choice of the core can usually be made quite freely, the choice of the bandwidth has a great influence on the smoothness of the estimator. The graph on the right shows that a large bandwidth (green) leads to a smoother estimate than choosing a small bandwidth (blue).

Derivation

The idea of the Nadaraya-Watson estimator is based on the fact that the unknown regression function

{\ displaystyle Y = m (X)}

with the help of the conditional expectation value is represented by the common density and the edge density . ${\ displaystyle f (x, y)}$ ${\ displaystyle f_ {X} (x)}$

{\ displaystyle m (x) = \ operatorname {E} (Y \ mid X = x) = \ int y {\ frac {f (x, y)} {f_ {X} (x)}} \, \ mathrm {d} y = {\ frac {\ int yf (x, y) \, \ mathrm {d} y} {f_ {X} (x)}}}

The unknown densities and are estimated using a kernel density estimate. A bivariate kernel density estimator with product kernel and bandwidths is used to calculate the common density from the observations : ${\ displaystyle f (x, y)}$ ${\ displaystyle f_ {X} (x)}$ ${\ displaystyle K (x, y) = K (x) K (y)}$ ${\ displaystyle g}$ ${\ displaystyle h}$

{\ displaystyle {\ widehat {f_ {g, h}}} (x, y) = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} K_ {h} (x- x_ {i}) K_ {g} (y-y_ {i})}

.

It follows

{\ displaystyle \ int yK_ {h} \ left (x-x_ {i} \ right) \, \ mathrm {d} y = {\ frac {1} {n}} \ sum _ {i = 1} ^ { n} y_ {i} K_ {h} (x-x_ {i})}

and using kernel density estimation for the Nadaraya-Watson estimator. ${\ displaystyle f_ {X} (x)}$

properties

Weights for different , and belt widths .

{\ displaystyle W_ {hi} (x)}

{\ displaystyle x}

{\ displaystyle i}

{\ displaystyle h}

1. As in the case of linear regression, the Nadaraya-Watson estimator can also be written as a linear combination of those with weight functions: ${\ displaystyle y_ {i}}$ ${\ displaystyle W_ {hi}}$

{\ displaystyle {\ hat {m}} (x) = \ sum _ {i = 1} ^ {n} y_ {i} W_ {hi} (x)}

.

The Nadaraya-Watson estimator is thus the (locally) weighted mean of the observed values , it applies ${\ displaystyle y_ {i}}$

{\ displaystyle \ sum _ {i = 1} ^ {n} W_ {hi} (x) = 1}

.

The graphic on the right shows the weights for different values of (blue :, green:, red:) . The scatter plot below zero shows the data of the explanatory variable. The larger the bandwidth (solid line vs. dashed line), the more observations um have a non-zero weight. The less data are available (right), the more the available observations have to be weighted. ${\ displaystyle x}$ ${\ displaystyle x = 10}$ ${\ displaystyle x = 20}$ ${\ displaystyle x = 30}$ ${\ displaystyle x}$

2. The mean square deviation results approximately as

{\ displaystyle \ operatorname {MSE} ({\ hat {m}} (x)) \ approx \ underbrace {h ^ {4} B ^ {2}} _ {= {\ text {Distortion}} ^ {2} } + \ underbrace {{\ frac {1} {nh}} V} _ {= {\ text {Variance}}}}

with and independent of and . This means that the convergence is slower than with linear regression, i.e. H. with the same number of observations, the predictive value can be estimated more precisely in the linear regression than in the Nadaraya-Watson estimator. ${\ displaystyle B}$ ${\ displaystyle V}$ ${\ displaystyle n}$ ${\ displaystyle h}$

It is the squared bias ( English bias ) of Nadaraya-Watson estimator

{\ displaystyle \ operatorname {Bias} ^ {2} ({\ hat {m}} (x)) = {\ frac {h ^ {4}} {4}} {\ left (m '' (x) + 2 {\ frac {m '(x) f' _ {X} (x)} {f_ {X} (x)}} \ right)} ^ {2} \ mu _ {2} ^ {2} (K )}

with and the first and second derivative of the unknown regression function, the first derivative of the density and . ${\ displaystyle m '(x)}$ ${\ displaystyle m '' (x)}$ ${\ displaystyle f '_ {X} (x)}$ ${\ displaystyle f_ {X} (x)}$ ${\ displaystyle \ mu _ {2} (K) = \ int u ^ {2} K (u) \ mathrm {d} \, u}$

And the variance of the estimator

{\ displaystyle \ operatorname {Var} ({\ hat {m}} (x)) = {\ frac {1} {nh}} {\ frac {\ sigma ^ {2} (x)} {f_ {X} (x)}} | K | _ {2} ^ {2}}

with and . ${\ displaystyle \ sigma ^ {2} (x) = \ operatorname {Var} (Y \ mid X = x)}$ ${\ displaystyle | K | _ {2} = \ int K ^ {2} (u) \, \ mathrm {d} u}$

Bandwidth selection

Resubstitution and leave-one-out cross-validation for the bandwidth of the Nadaraya-Watson estimator for the above example. The "optimal" bandwidth results for approx .

{\ displaystyle h = 0 {,} 7}

The main problem with nuclear regression is choosing an appropriate bandwidth . The minimization of the mean square deviation serves as the basis ${\ displaystyle h}$

{\ displaystyle \ operatorname {MSE} ({\ hat {m}} (x)) = \ operatorname {E} \ left (({\ hat {m}} (x) -m (x)) ^ {2} \ right)}

or their approximation. However, the approximation contains the second derivative of the unknown regression function as well as the unknown density function and its derivative. Instead, it uses the data-based mean square deviation ${\ displaystyle m '' (x)}$ ${\ displaystyle f_ {X} (x)}$

{\ displaystyle \ operatorname {ASE} ({\ hat {m}} (x)) = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} ({\ hat {m} } (x) -y_ {i}) ^ {2}}

minimized. Since the value of is used to estimate, a bandwidth leads to a (resubstitution estimate). Therefore a leave-one-out cross-validation is performed, i.e. H. all observations except the i-th are used to calculate the estimated value . This is used to calculate the for different bandwidths. The bandwidth that gives a minimum ASE is then used to estimate the unknown regression function. ${\ displaystyle {\ hat {m}} (x)}$ ${\ displaystyle y_ {i}}$ ${\ displaystyle h = 0}$ ${\ displaystyle \ operatorname {ASE} ({\ hat {m}} (x)) = 0}$ ${\ displaystyle {\ hat {m}} (x_ {i})}$ ${\ displaystyle \ operatorname {ASE} ({\ hat {m}} (x))}$

Confidence bands

After estimating the regression function , the question arises as to how far it deviates from the true function . The work by Bickel and Rosenblatt (1973) provides two theorems for point-wise confidence bands and uniform confidence bands . ${\ displaystyle {\ hat {m}} (x)}$ ${\ displaystyle m (x)}$

In addition to information about the deviation between and , the confidence bands provide an indication of whether a possible parametric regression model, e.g. B. a linear regression, fits the data. If the estimated course of the regression function of the parametric regression model lies outside the confidence bands, this is an indication that the parametric regression model does not fit the data. A formal test is possible with the help of bootstrapping procedures . ${\ displaystyle {\ hat {m}} (x)}$ ${\ displaystyle m (x)}$

Linear regression (black) and Nadaraya-Watson estimator (red) with optimal bandwidth and point-wise 95% confidence band.

Point-wise confidence bands : Under certain conditions converges in distribution

{\ displaystyle n ^ {2/5} \ left ({\ hat {m}} (x) -m (x) \ right) \ longrightarrow {\ mathcal {N}} (B (x), V (x) )}

with , and . ${\ displaystyle h = cn ^ {1/5}}$ ${\ displaystyle B (x) = c \ mu _ {2} (K) \ left ({\ tfrac {m '' (x)} {2}} + {\ tfrac {m '(x) f' _ { X} (x)} {f_ {X} (x)}} \ right)}$ ${\ displaystyle V (x) = {\ tfrac {\ sigma (x) | K | _ {2} ^ {2}} {cf_ {X} (x)}}}$

If the bandwidth is small enough, then the asymptotic bias can be neglected against the asymptotic variance . This allows approximate confidence bands to be calculated ${\ displaystyle B (x)}$ ${\ displaystyle V (x)}$ ${\ displaystyle 1- \ alpha}$

{\ displaystyle {\ hat {m}} (x) \ pm z_ {1- \ alpha / 2} {\ sqrt {\ frac {| K | _ {2} ^ {2} {\ hat {\ sigma}} ^ {2} (x)} {nh {\ hat {f}} _ {X} (x)}}}}

with the quantile of the standard normal distribution . The unknown density is estimated with a kernel density estimate and with ${\ displaystyle z_ {1- \ alpha / 2}}$ ${\ displaystyle 1- \ alpha / 2}$ ${\ displaystyle f_ {x} (x)}$ ${\ displaystyle {\ hat {f}} _ {X} (x)}$ ${\ displaystyle \ sigma ^ {2} (x)}$

{\ displaystyle {\ hat {\ sigma}} ^ {2} (x) = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} W_ {hi} (x) \ left (y_ {i} - {\ hat {m}} (x) \ right) ^ {2}}

.

The graph on the right shows the Nadaraya-Watson estimator with a 95% confidence band (red lines). The black linear regression line lies clearly outside the confidence band in various areas. This is an indication that a linear regression model is not appropriate here.

Uniform confidence bands : Under somewhat stronger conditions than before and with , with and for cores with carriers in converged ${\ displaystyle x \ in [0; 1]}$ ${\ displaystyle h = n ^ {- \ kappa}}$ ${\ displaystyle 1/5 <\ kappa <1/2}$ ${\ displaystyle [-1; 1]}$

{\ displaystyle P \ left (| {\ hat {m}} (x) -m (x) | \ leq z_ {n, \ alpha} {\ sqrt {\ frac {| K | _ {2} ^ {2 } {\ hat {\ sigma}} ^ {2} (x)} {nh {\ hat {f}} _ {X} (x)}}} \ right) \ longrightarrow 1- \ alpha}

With

{\ displaystyle z_ {n, \ alpha} = {\ sqrt {{\ frac {1} {\ sqrt {2 \ kappa \ log (n)}}} \ left (\ log \ left ({\ frac {1} {2 \ pi}} {\ frac {| K '| _ {2}} {| K | _ {2}}} \ right) ^ {1/2} - \ log \ left (- {\ frac {1 } {2}} \ log (1- \ alpha) \ right) \ right) + {\ sqrt {2 \ kappa \ log (n)}}}}}

.

The condition is not a restriction, as the data can only be transformed to the interval . The confidence band is then calculated and transformed back to the original data. ${\ displaystyle x \ in [0; 1]}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle [0; 1]}$

Gasser-Müller estimator

In the fixed design case with , the density is known and therefore does not have to be estimated. This simplifies both the calculations and the mathematical treatment of the estimator. In this case, the Gasser-Müller estimator was defined as ${\ displaystyle a = x_ {1} \ leq x_ {2} \ leq \ dots \ leq x_ {n} = b}$ ${\ displaystyle f_ {X} (x)}$

{\ displaystyle {\ hat {m}} ^ {GM} (x) = \ sum _ {i = 1} ^ {n} y_ {i} W_ {hi} ^ {GM} (x)}

With

{\ displaystyle W_ {hi} ^ {GM} (x) = n \ int _ {s_ {i-1}} ^ {s_ {i}} K_ {h} (xu) \, \ mathrm {d} u}

and , and . ${\ displaystyle s_ {0} = a}$ ${\ displaystyle s_ {n + 1} = b}$ ${\ displaystyle s_ {i} = (x_ {i} + x_ {i-1}) / 2}$

properties

1. Like the Nadaraya-Watson estimator, the Gasser-Müller estimator is a linear estimator and the sum of the weight functions is one.

2. The following applies to the mean square deviation:

{\ displaystyle \ operatorname {MSE} ({\ hat {m}} ^ {GM} (x)) \ approx \ underbrace {{\ frac {h ^ {4}} {4}} \ mu _ {2} ^ {2} (K) (m '' (x)) ^ {2}} _ {= {\ text {Distortion}} ^ {2}} + \ underbrace {{\ frac {1} {nh}} \ | K \ | _ {2} ^ {2}} _ {= {\ text {Variance}}}}

.

Local nuclear polynomial regression

Local approximations for the Nadaraya-Watson estimator (locally constant) and the locally linear estimator at selected data points. The graphic is restricted to the range of the x-values (i.e. left margin of the data), but the calculations were carried out with all data.

{\ displaystyle [1 {,} 5; 5]}

The Nadaraya-Watson estimator can be written as a solution to the following local minimization problem:

{\ displaystyle \ min _ {\ beta _ {0} ^ {(0)}} \ sum _ {i = 1} ^ {n} \ left (y_ {i} - \ beta _ {0} ^ {(0 )} \ right) ^ {2} K_ {h} (x-x_ {i})}

,

i.e. for each a locally constant value is determined which is equal to the value of the Nadaraya-Watson estimator at that point . ${\ displaystyle x}$ ${\ displaystyle \ beta _ {0} ^ {(0)}}$ ${\ displaystyle {\ hat {m}} (x)}$ ${\ displaystyle x}$

A polynomial can also be used instead of a local constant:

{\ displaystyle \ min _ {\ beta _ {0} ^ {(p)}, \ dots, \ beta _ {p} ^ {(p)}} \ sum _ {i = 1} ^ {n} (y_ {i} - \ beta _ {0} ^ {(p)} - \ beta _ {1} ^ {(p)} (x_ {i} -x) - \ dots - \ beta _ {p} ^ {( p)} (x_ {i} -x) ^ {p}) ^ {2} K_ {h} (x-x_ {i})}

,

d. H. the unknown regression value is approximated by a local polynomial . The local polynomial nuclear regression results at every point from ${\ displaystyle m_ {p} (x)}$

{\ displaystyle m_ {p} (x) = {\ hat {\ beta}} _ {0} ^ {(p)}}

.

The graphic on the right shows the local polynomials used at selected points . The Nadaraya-Watson estimator (red) uses locally constant functions . The locally linear nuclear regression (blue) uses locally linear functions at the point . The selected positions are identical in the graphic with data points. The vertical gray lines connect the local polynomials with the associated x-value (data point). The intersection with the red or blue polynomial gives the estimated value at the corresponding point for the Nadaraya-Watson estimator and the locally linear nuclear regression. ${\ displaystyle x}$ ${\ displaystyle \ beta _ {0} ^ {(0)}}$ ${\ displaystyle \ beta _ {0} ^ {(1)} + \ beta _ {1} ^ {(1)} ({\ tilde {x}} - x)}$ ${\ displaystyle x}$ ${\ displaystyle x}$ ${\ displaystyle x}$

Advantages and features

The local polynomial regression offers several advantages over the Nadaraya-Watson estimator:

In general, the local constant is influenced by observation values that are both left and right of the value . At the edges of which, however, does not work and this leads to edge effects ( English boundary effects ). However, the local polynomial nuclear regression approximates locally with a polynomial and can avoid this problem. ${\ displaystyle \ beta _ {0} ^ {(0)}}$ ${\ displaystyle x}$

To appreciate the te derivation, one could simply derive the Nadaraya-Watson correspondingly often. With the local polynomial nuclear regression, however, there is a much more elegant way:

{\ displaystyle v}

{\ displaystyle m_ {p} ^ {(v)} (x) = v! {\ hat {\ beta}} _ {v} ^ {(p)}}

Mostly is used or . Odd orders are better than even orders.

{\ displaystyle p = v + 1}

{\ displaystyle p = v + 3}

{\ displaystyle p}

As in the case of the linear regression and the Nadaraya-Watson estimator, the locally polynomial nuclear regression can also be written as a linear combination of the weight functions: ${\ displaystyle y_ {i}}$ ${\ displaystyle W_ {hi} ^ {(p)}}$

{\ displaystyle {\ hat {m}} _ {p} (x) = \ sum _ {i = 1} ^ {n} y_ {i} W_ {hi} ^ {(p)} (x)}

.

Estimation of the regression parameters

Define the following matrices:

{\ displaystyle {\ mathcal {X}} = {\ begin {pmatrix} 1 & (x_ {1} -x) & \ cdots & (x_ {1} -x) ^ {p} \\ 1 & (x_ {2} -x) & \ cdots & (x_ {2} -x) ^ {p} \\\ vdots & \ vdots && \ vdots \\ 1 & (x_ {n} -x) & \ cdots & (x_ {n} - x) ^ {p} \ end {pmatrix}}}

,

{\ displaystyle {\ mathcal {Y}} = {\ begin {pmatrix} y_ {1} \\ y_ {2} \\\ vdots \\ y_ {n} \ end {pmatrix}}}

and

{\ displaystyle {\ mathcal {W}} = {\ begin {pmatrix} K_ {h} (x-x_ {1}) & 0 & \ cdots & 0 \\ 0 & K_ {h} (x-x_ {2}) & \ cdots & 0 \\\ vdots & \ vdots & \ ddots & \ vdots \\ 0 & 0 & \ cdots & K_ {h} (x-x_ {n}) \ end {pmatrix}}}

the regression parameters are estimated as ${\ displaystyle \ beta = (\ beta _ {0} ^ {(p)}, \ dots, \ beta _ {p} ^ {(p)}) ^ {T}}$

{\ displaystyle {\ hat {\ beta}} = \ left ({\ mathcal {X}} ^ {T} {\ mathcal {W}} {\ mathcal {X}} \ right) ^ {- 1} {\ mathcal {X}} ^ {T} {\ mathcal {W}} {\ mathcal {Y}}}

.

The coefficients necessary for the derivation are automatically calculated in the estimation process!

To make the estimate practical, one calculates

{\ displaystyle S_ {j} = \ sum _ {i = 1} ^ {n} K_ {h} (x-x_ {i}) (x_ {i} -x) ^ {j}}

{\ displaystyle T_ {j} = \ sum _ {i = 1} ^ {n} K_ {h} (x-x_ {i}) (x_ {i} -x) ^ {j} y_ {i}}

and calculated

{\ displaystyle {\ hat {\ beta}} = {\ begin {pmatrix} S_ {0} & S_ {1} & \ cdots & S_ {p} \\ S_ {1} & S_ {2} & \ cdots & S_ {p + 1} \\\ vdots & \ vdots & \ ddots & \ vdots \\ S_ {p} & S_ {p + 1} & \ cdots & S_ {2p} \ end {pmatrix}} ^ {- 1} {\ begin {pmatrix } T_ {0} \\ T_ {1} \\\ vdots \\ T_ {p} \ end {pmatrix}}}

Locally linear nuclear regression

Different local regression methods: Nadaraya-Watson (red), local-linear (blue) and LOWESS (green) and linear regression (black).

One of the best-known locally linear regression models ( ) is the locally weighted regression scatter plot smoother , abbreviated as LOESS or obsolete LOWESS ( English for locally weighted scatterplot smoothing , German locally weighted scatter plot smoothing ). However, the LOWESS is not a local-linear nuclear regression, because ${\ displaystyle p = 1}$

the regression weights are estimated robustly and
the bandwidth varies with . ${\ displaystyle x}$

The graphic on the right shows two different methods of nuclear regression: locally constant (red, Nadaraya-Watson) and locally linear (blue). The local linear core regression approximates the data somewhat better, especially at the edges.

The locally linear nuclear regression results as

{\ displaystyle {\ hat {m}} _ {1} (x) = {\ frac {T_ {0} S_ {2} -T_ {1} S_ {1}} {S_ {0} S_ {2} - S_ {1} ^ {2}}}}

.

The mean square deviation of the locally linear regression results, as with the Nadaraya-Watson estimator, as

{\ displaystyle \ operatorname {MSE} ({\ hat {m}} _ {1} (x)) \ approx \ underbrace {h ^ {4} B ^ {2}} _ {= {\ text {Distortion}} ^ {2}} + \ underbrace {{\ frac {1} {nh}} V} _ {= {\ text {Variance}}}}

With

{\ displaystyle \ operatorname {Bias} ^ {2} ({\ hat {m}} _ {1} (x)) = {\ frac {h ^ {4}} {4}} \ left (m '' ( x) \ right) ^ {2} \ mu _ {2} ^ {2} (K)}

and the variance is identical to the variance of the Nadaraya-Watson estimator . The simpler form of the distortion makes the local core linear regression more attractive for practical purposes. ${\ displaystyle \ operatorname {Var} ({\ hat {m}} (x))}$

Individual evidence

↑ Elizbar A. Nadaraya: On estimating regression . In: Theory of Probability and its Applications . tape 9 , no. 1 , 1964, pp. 141-142 , doi : 10.1137 / 1109020 .
^ Geoffrey S. Watson: Smooth Regression Analysis . In: Sankhya: The Indian Journal of Statistics, Series A . tape 26 , no. 4 , December 1964, p. 359-372 .
↑ Bickel, Rosenblatt (1973) On some global measures of the deviations of density function estimators , Annals of Statistics 1, pp. 1071-1095
^ Theo Gasser, Hans-Georg Müller: Estimating Regression Functions and Their Derivatives by the Kernel Method . In: Scandinavian Journal of Statistics . tape 11 , no. 3 , 1984, pp. 171-185 .
^ WS Cleveland: Robust Locally Weighted Regression and Smoothing Scatterplots . In: Journal of the American Statistical Association . tape 74 , no. 368 , December 1979, p. 829-836 , JSTOR : 2286407 .

literature

Jianqing Fan, Irene Gijbels: Local Polynomial Modeling and Its Applications . Chapman and Hall / CRC, 1996, ISBN 978-0-412-98321-4 .
Wolfgang Härdle , Marlene Müller, Stefan Sperlich, Axel Werwatz: Nonparametric and Semiparametric Models . Springer Verlag, Berlin, Heidelberg 2004, ISBN 978-3-540-20722-1 ( hu-berlin.de ).
Tristen Hayfield, Jeffrey S. Racine: Nonparametric Econometrics: The np Package . In: Journal of Statistical Software . tape 27 , no. 5 , 2008 ( jstatsoft.org ).
MP Wand, MC Jones: Kernel Smoothing . Chapman and Hall / CRC, 1994, ISBN 978-0-412-55270-0 .

[1] Elizbar A. Nadaraya: On estimating regression . In: Theory of Probability and its Applications . tape 9 , no. 1 , 1964, pp. 141-142 , doi : 10.1137 / 1109020 .

[2] Geoffrey S. Watson: Smooth Regression Analysis . In: Sankhya: The Indian Journal of Statistics, Series A . tape 26 , no. 4 , December 1964, p. 359-372 .

[3] Bickel, Rosenblatt (1973) On some global measures of the deviations of density function estimators , Annals of Statistics 1, pp. 1071-1095

[4] Theo Gasser, Hans-Georg Müller: Estimating Regression Functions and Their Derivatives by the Kernel Method . In: Scandinavian Journal of Statistics . tape 11 , no. 3 , 1984, pp. 171-185 .

[5] WS Cleveland: Robust Locally Weighted Regression and Smoothing Scatterplots . In: Journal of the American Statistical Association . tape 74 , no. 368 , December 1979, p. 829-836 , JSTOR : 2286407 .