Kaplan-Meier estimator

The Kaplan-Meier estimator (also known as the product limit value estimator , PGS for short ) is used to estimate the probability that a certain event will not occur in a test object within a time interval. It is a non-parametric estimate of the survival function as part of the event time analysis . The underlying data can be right-censored . This method was developed in 1958 by Edward L. Kaplan and Paul Meier .

The term product limit value estimator comes from the fact that this estimator can be interpreted as the limit value of life table estimates with interval lengths approaching zero.

Calculation rule

The Kaplan-Meier estimator for the survival function (i.e. the probability that the time to the occurrence of the event exceeds) is given by: ${\ displaystyle S (t)}$ ${\ displaystyle t}$

{\ displaystyle {\ hat {S}} (t) = \ prod _ {t _ {(i)} \ leq t} {\ frac {n_ {i} -d_ {i}} {n_ {i}}} = \ prod _ {t _ {(i)} \ leq t} \ left (1 - {\ frac {d_ {i}} {n_ {i}}} \ right)}

With

{\ displaystyle {\ hat {S}} (0) = 1}

{\ displaystyle d_ {i} =}

Test objects for which the event occurred at the time

{\ displaystyle t _ {(i)}}

{\ displaystyle n_ {i} =}

Test objects at the time under risk

{\ displaystyle t _ {(i)}}

example

The following table should be used as a basis:

Object no.	Time t (days)	1 = event occurred, 0 = censored	At risk n (t)	S (t)
#1	1	0	15th	1
# 2	12	1	14th	0.93
# 3	22nd	0
# 4	29	1	12	0.85
# 5	31	1	11	0.77
# 6	36	0
# 7	38	0
#8th	50	0
# 9	60	0
# 10	61	1	6th	0.64
# 11	70	1	5	0.51
# 12	88	0
# 13	99	0
# 14	110	0
# 15	140	0

If the table shows the results of a clinical study, it represents the following events:

There are initially 15 patients. But they are “at risk”, ie the event has not yet happened to them.

Day 1: A patient is already lost after one day in the study, ie he has left the study without the event having occurred to him by then (e.g. last observation 1 day before the end of the study).

{\ displaystyle {\ hat {S}} (1) = {\ frac {15-0} {15}} = 1}

Such terms caused by censorship are always 1 and are therefore no longer included in the following calculations. It is censored , so only 14 patients are now at risk.

Day 12: The event occurs in a patient.

{\ displaystyle {\ hat {S}} (12) = {\ frac {14-1} {14}} = 0.9286}

There are now 13 patients at risk.

Day 22: Another patient needs to be censored. does not change: ${\ displaystyle {\ hat {S}}}$

{\ displaystyle {\ hat {S}} (22) = {\ has {S}} (12)}

The number of patients at risk is reduced to 12.

Day 29: The event occurs in another patient.

{\ displaystyle {\ hat {S}} (29) = {\ frac {12-1} {12}} \ cdot {\ frac {14-1} {14}} = 0.9167 \ cdot 0.9286 = 0.8512}

There are now 11 patients at risk.

etc.

Therefore, the longest observed patients are at the end of the curve. The reduced number of patients at risk also increases the uncertainty of the estimate for the risk at a later point in time (broader confidence interval ).

Presentation of the results obtained. The black crosses mark the time of censorship. A confidence interval is shown in dashed lines.

properties

Variance

The variance of the estimator can be in the interval ${\ displaystyle t_ {k} \ leq t \ leq t_ {k + 1}}$

by means of

{\ displaystyle {\ widehat {\ operatorname {Var}}} \ {{\ hat {S}} (t) \} \ approx [{\ hat {S}} (t)] ^ {2} \ left \ { \ sum _ {i = 1} ^ {k} {\ frac {d_ {i}} {n_ {i} (n_ {i} -d_ {i})}} \ right \}}

to be appreciated.

Confidence interval

The confidence interval can be as usual from the variance or the standard error can be calculated.

{\ displaystyle {\ widehat {\ operatorname {SE}}} \ {{\ hat {S}} (t) \} \ approx [{\ hat {S}} (t)] \ left \ {\ sum _ { i = 1} ^ {k} {\ frac {d_ {i}} {n_ {i} (n_ {i} -d_ {i})}} \ right \} ^ {\ frac {1} {2}} }

This formula is also known as Greenwood's formula or Greenwood's formula .

The 95% confidence interval is thus:

{\ displaystyle \ left [{\ hat {S}} (t) -1 {,} 96 \ cdot {\ widehat {\ operatorname {SE}}} \ {{\ hat {S}} (t) \}; {\ hat {S}} (t) +1 {,} 96 \ cdot {\ widehat {\ operatorname {SE}}} \ {{\ hat {S}} (t) \} \ right]}

literature

A. Ziegler, S. Lange & R. Bender: Survival time analysis: properties and Kaplan-Meier method . German Medical Weekly , 132 (S 01) (2007), pp. E36 – e38. doi : 10.1055 / s-2007-959038
Karl Michael Ortmann: Practical Life Insurance Mathematics , Springer Spectrum, Wiesbaden 2016, ISBN 978-3-658-10199-2 , pp. 74-77.

Individual evidence

^ Edward L. Kaplan & Paul Meier: Individual Nonparametric Estimation from Incomplete Observations . Journal of the American Statistical Association , 53 (282) (1958), pp. 457-481. doi : 10.1080 / 01621459.1958.10501452 JSTOR 2281868

[1] Edward L. Kaplan & Paul Meier: Individual Nonparametric Estimation from Incomplete Observations . Journal of the American Statistical Association , 53 (282) (1958), pp. 457-481. doi : 10.1080 / 01621459.1958.10501452 JSTOR 2281868