Event time analysis

from Wikipedia, the free encyclopedia
Racine carrée bleue.svg
This item has been on the quality assurance side of the portal mathematics entered. This is done in order to bring the quality of the mathematics articles to an acceptable level .

Please help fix the shortcomings in this article and please join the discussion !  ( Enter article )

The survival analysis (also duration analysis , historical data analysis , event data analysis , English survival analysis , analysis of failure times and event history analysis ) is a toolbox of statistical methods , in which the time until a certain event ( " time to event is") compared between groups, to estimate the effect of prognostic factors, medical treatment, or harmful influences. The event can be imminent death, but any other endpoints such as healing, illness or occurrence of a complication are also possible. Examples of such an analysis are the Kaplan-Meier estimator , the Cox regression, or the model for accelerated downtime . The hazard rate is a key parameter .

Designations for this process

The procedure has been given different names by different authors. Because there are different purposes of application, different terms are still in use today, which are equivalent and are often used synonymously . The basic procedure is always the same.

  • In medical statistics mostly survival analysis, survival time analysis .
  • In empirical social research , the method is known as length of stay analysis (also: process data analysis, event analysis), where it deals with changes in a social condition (e.g. length of marriage). It thus provides more precise descriptions of historical data than e.g. B. a time series or panel data analysis . Using event-oriented data structures, it provides information about the exact time until a status change.
  • In engineering, the procedure is also called reliability analysis ( English Reliability Theory ).
  • In English-language program packages, it is referred to as Survival Analysis , Analysis of Failure Times or Event History Analysis .

Areas of application

This procedure can always be used when there is mortality , i. H. a successive removal of measurement objects from the statistical recording. It doesn't have to be death, but mechanical system failure or retirement. Even if positive events occur, i. H. The method can be used for new events for which there was previously no measurement basis. (Birth of the first child, occurrence of the first technical problems or warranty cases)

Examples of an event time analysis: What proportion of a population will still be alive after a given time? At what rate will the survivors die then? Which properties or influences increase or decrease the probability of survival?

First it is necessary to define event time (lifetime). For biological systems, life ends with death. It is more difficult with mechanical reliability. Failures are often not clearly defined and can be partial. Often it is only a gradual failure that cannot be easily timed. Similar difficulties arise with other biological events. For example, a heart attack or organ failure are difficult to schedule.

Usually only events are examined that can occur at most once per subject. An extension to recurring events is possible.

Basic sizes and concepts

Survival function

The central function is the survival function ( English survival function, survivor function ) and is designated with . In the area of technical systems, the designation reliability function (this feature English Reliability Function ) used and referred to:

denotes certain points in time, represents the lifetime (the time until death or failure of a device), and denotes the probability. The survival function indicates the probability with which an individual in the population will have a lifetime longer or greater than .

Since all the individuals of interest are still alive at the beginning of an analysis ( ), the probability of “surviving” this “zero” point in time is the same . If immediate death or failure is possible, this start value can also be less than . The survival function must be monotonically decreasing: if . If this function is known, then the distribution function and the density function are also clearly defined.

Usually it is assumed that with increasing time the probability to survive each time goes to zero, ie: . If this limit is greater than zero, eternal life is possible. In other words, are and are in opposite directions. The more time passes, the more likely a certain event will occur. The survival function begins as a function at with the value , and strives towards the value over time (although it is usually not reached, since the observation is ended at a certain point in time). Graphically, one can imagine a staircase function leading from outwards in the downward direction , i.e. monotonically falling, whereby the individual steps can be of different steepness or width. The steepness of the stairs results from the number of dying individuals, and their width from the number of times past . The wider and flatter such stairs are, the higher the likelihood of survival of the individuals.

Event time distribution function and event density function

Related variables can be derived from the survival function. The event time distribution function, in technical terms as a probability of default ( english Probability of failure ) designated with F abbreviated, is the complementary function to the survival function:

and therefore also applies . The first derivative of the event density function or failure density ( English failure density function ) is mixed with indicated:

.

The event density function is the rate of the observed event per unit of time.

Hazard function and cumulative hazard function

The failure rate (especially in the survival time analysis also called hazard function and designated with ) is defined as the rate at which an event occurs at the point in time , provided that it has not yet occurred by point in time t :

.

English force of mortality is a synonym for the hazard function which is used especially in demography .

The failure rate must always be positive and the integral over must be infinite. The hazard function can increase or decrease; it does not need to be monotonous or continuous.

Alternatively, the hazard function can also be replaced by the cumulative hazard function:

,

so is

is called the cumulative hazard function

applies.

It describes the “accumulation” of hazard over time.

It follows from this that with increasing time it increases indefinitely if it approaches zero. It also follows that it must not fall too much, because otherwise the cumulative hazard function converges to a finite value. For example, there is no hazard function of any event time distribution because the integral converges.

Quantities derived from the survival function

The remaining lifetime at a point in time t 0 is the time remaining until death or failure, ie . Future life expectancy is the expected value of the remaining life. The event density function for the point in time assuming survival to is even

.

So that is future life expectancy

or
.

For this is reduced to life expectancy at birth.

In reliability analyzes, the life expectancy ( mean time to failure ) and the mean remaining service life are mentioned.

The age at which the proportion of survivors reaches a given value q can be determined using the equation S ( t ) = q . t is the quantile we are looking for. Usually one is interested in quantities like the median of the lifetime q = 1/2 or other quantiles like q = 0.90 or q = 0.99.

context

The following table shows the mathematical relationship between the various parameters:

  Failure probability
F (t)
Survival probability
R (t)
Failure density
f (t)
Failure rate
h (t)
F (t)  
R (t)  
f (t)  
h (t)  

Examples of survival functions

For event time models, one first selects a basic survival function. It is relatively easy to replace one distribution function with another to study the effects. Nothing changes in the basic theory.

When choosing the specific distribution, prior knowledge of the specific process plays a major role. It is roughly analogous to the selection of the coupling function in generalized linear models . Some commonly used functions are listed below.

Probability distribution
Exponential distribution
Weibull distribution
Log normal distribution

The function is the error function .

Estimating the parameters

Event time models can be viewed as normal regression models in which the result variable is time. The calculation of the likelihood function is complicated because not all information is available at all times.

If birth and death are known, then in this case the life course is clear. If, on the other hand, you only know that the birth took place before a certain point in time, then this data set is called left-censored. Likewise, it could only be known that death occurred after a certain date. This is then a right-censored data set. In this way, a resume can also be censored on the right and left (interval censored). If a person who has not reached a certain age is not observed at all, then the data set is truncated. With a left-censored dataset, on the other hand, we at least know that the individual existed.

There are some standard cases for censored and truncated records. A right-censored data set is common. If we look at a group of living subjects, we know that they are alive today. But we do not know the date of their death in the future. Links censored data is also common. We could know for any subject that they are alive today, but we don't know their exact birthday. Truncated data occur in delayed onset studies. For example, retirees could be observed from the age of 70. Not even the existence of the people who died before is known.

The likelihood function for an event time model with censored data can be defined as follows. By definition, the likelihood function is the common probability of the data given given model parameters. It is common to assume that the data is independent of the parameters. Then the likelihood function is the product of the probabilities for each event time. We divide the data into four categories: uncensored, left censored, right censored, and interval censored data. We differentiate them in the formulas with "unc.", "Lz", "rz" and "iz":

For an uncensored event time with the age of death we use

.

For left-censored data we know that death before a time entered

.

For a right-censored individual, we know that death occurs after time , so is

And for interval censored events, we know that death occurs between and

See also

literature

  • Hans-Peter Blossfeld, Götz Rohwer, Katrin Golsch: Event History Analysis with Stata. Lawrence Erlbaum Associates, Mahwah, NJ 2007.
  • Regina Elandt-Johnson, Norman Johnson. Survival Models and Data Analysis. John Wiley & Sons, New York 1980/1999.
  • Wolfgang Ludwig-Mayerhofer: Statistical modeling of historical data in the analysis of social problems. In: Social Problems. No. 5/6, 1994.
  • Mario Cleves et al: An Introduction to Survival Analysis Using Stata. 3. Edition. Stata Press, 2010.
  • Jerald F. Lawless: Statistical Models and Methods for Lifetime Data. 2nd Edition. John Wiley and Sons, Hoboken 2003.
  • Melinda Mills: Introducing Survival and Event History Analysis. Sage Publications, 2011.
  • Terry Therneau: A Package for Survival Analysis in S. Feb 1999. (online)
  • Arno Meyna, Bernhard Pauli: Reliability Technology. Quantitative assessment procedures . 2nd Edition. Hanser, 2010, ISBN 978-3-446-41966-7 .

Web links

  • Length of stay analysis - entry in ILMES (Internet lexicon of methods of empirical social research)
  • A. Ziegler, S. Lange, R. Bender: Survival time analysis: properties and Kaplan-Meier method - Article No. 15 of the statistics series in the DMW. In: DMW - German Medical Weekly. 127, S. T 14, doi: 10.1055 / s-2002-32819 .

Individual evidence

  1. ^ Christian FG Schendera: Regression analysis with SPSS. , ISBN 978-3-486-71062-5 , p. 233 (accessed from De Gruyter Online).
  2. ^ Christian FG Schendera: Regression analysis with SPSS. , ISBN 978-3-486-71062-5 , p. 233 (accessed from De Gruyter Online).
  3. ^ Mario Cleves, William Gould, Roberto G. Gutierrez, Yulia V. Marchenko: An Introduction to Survival Analysis Using Stata. 3rd edition. Stata Press, 2010, ISBN 978-1-59718-074-0 .