Cox regression

from Wikipedia, the free encyclopedia

The Cox regression , also called Cox's regression model , is a regression analytical method named after David Cox for modeling survival times .

Like all event time analysis methods, it is a procedure for estimating the influence of independent variables on the duration until the occurrence of events (“survival time”) or their hazard rate . As a so-called semi-parametric method, the estimation does not provide a complete prediction model for survival time, but leaves the distribution function of the observed episode ends unspecified and only estimates the influence of metric or categorical variables on a basic hazard rate that is assumed to be the same across all cases .

model

The regression model proposed by Cox is used to examine the behavior of the hazard rates as a function of environmental influences. Basis of the model are influence vectors with which for each individual can be monitored in the study. The relationship between these influences and the hazard function is then given by the relation

manufactured. denotes an unknown hazard function which, in the initial case, represents the associated hazard function without any influences . It is  treated as a disturbance parameter . is an unknown parameter, also n -dimensional. The task of statistics is to estimate this parameter.

The observations

In the Cox regression model, the observations consist of a triple , where, as above , denotes the influence vector for the individual .

is (as is usual in the case of the examination of censored data ) defined as the minimum of two random variables and . In the case of the actually observed death of an individual, indicates the time of death of . If, on the other hand, only the study was terminated, indicates the time of termination. It is obvious that conclusions about the form of the hazard function can only be drawn from an observation of death. Hence, indicates whether death or the end of the study was observed. refers to the indicator function .

The estimate of β

Due to the structure of , the problem arises that no conclusions can be drawn about in intervals without a death . Finally, it is possible that the unknown basic hazard function disappears in this interval and that no deaths can occur a priori. So you resort to a trick and look at conditional probabilities .

If information can only be obtained about when a death has occurred, the following probability can be calculated at the time of the death of an individual : How likely is it that of all individuals still alive, of all people, will now die? Formally, it can be described as

to calculate. refers to those individuals who are still alive at the time of death .

In order to find a kind of maximum likelihood estimator for , the likelihood function

maximized. By increasing the individual conditional probabilities to the power of the fact that only the observation of a death and not the observation of the end of the study provides information about .

literature