Kaplan-Meier estimator

from Wikipedia, the free encyclopedia

The Kaplan-Meier estimator (also known as the product limit value estimator , PGS for short ) is used to estimate the probability that a certain event will not occur in a test object within a time interval. It is a non-parametric estimate of the survival function as part of the event time analysis . The underlying data can be right-censored . This method was developed in 1958 by Edward L. Kaplan and Paul Meier .

The term product limit value estimator comes from the fact that this estimator can be interpreted as the limit value of life table estimates with interval lengths approaching zero.

Calculation rule

The Kaplan-Meier estimator for the survival function (i.e. the probability that the time to the occurrence of the event exceeds) is given by:

With

Test objects for which the event occurred at the time
Test objects at the time under risk

example

The following table should be used as a basis:

Object no. Time t (days) 1 = event occurred,
0 = censored
At risk

n (t)

S (t)
#1 1 0 15th 1
# 2 12 1 14th 0.93
# 3 22nd 0  
# 4 29 1 12 0.85
# 5 31 1 11 0.77
# 6 36 0  
# 7 38 0  
#8th 50 0  
# 9 60 0  
# 10 61 1 6th 0.64
# 11 70 1 5 0.51
# 12 88 0  
# 13 99 0  
# 14 110 0  
# 15 140 0  

If the table shows the results of a clinical study, it represents the following events:

There are initially 15 patients. But they are “at risk”, ie the event has not yet happened to them.

Day 1: A patient is already lost after one day in the study, ie he has left the study without the event having occurred to him by then (e.g. last observation 1 day before the end of the study).

Such terms caused by censorship are always 1 and are therefore no longer included in the following calculations. It is censored , so only 14 patients are now at risk.

Day 12: The event occurs in a patient.

There are now 13 patients at risk.

Day 22: Another patient needs to be censored. does not change:

The number of patients at risk is reduced to 12.

Day 29: The event occurs in another patient.

There are now 11 patients at risk.

etc.

Therefore, the longest observed patients are at the end of the curve. The reduced number of patients at risk also increases the uncertainty of the estimate for the risk at a later point in time (broader confidence interval ).

Presentation of the results obtained. The black crosses mark the time of censorship. A confidence interval is shown in dashed lines.

properties

Variance

The variance of the estimator can be in the interval

by means of

to be appreciated.

Confidence interval

The confidence interval can be as usual from the variance or the standard error can be calculated.

This formula is also known as Greenwood's formula or Greenwood's formula .

The 95% confidence interval is thus:

See also

literature

Individual evidence

  1. ^ Edward L. Kaplan & Paul Meier: Individual Nonparametric Estimation from Incomplete Observations . Journal of the American Statistical Association , 53 (282) (1958), pp. 457-481. doi : 10.1080 / 01621459.1958.10501452 JSTOR 2281868