Censored data

from Wikipedia, the free encyclopedia

In statistics, and there in particular in medical statistics , censored data is a form of data for which not all values ​​of a statistical variable are known.

history

Daniel Bernoulli addressed already in 1766 the problem of censored data when he tried the effectiveness of the smallpox vaccine demonstrated by cowpox.

Types of censored data

Right-censored data

If the event has not been observed by the end of the experiment, the data is called right censored .

One can define three main types of right-censored data:

Type I: In experiments with a fixed start and end point, all observations are censored at the end of the experiment if the event has not yet occurred for this test object. This means that with type I all censored observations are equal to the length of the experiment.

Type II: In experiments in which the end point is determined by reaching a certain number of events, one speaks of type II.

Type III: If the start and end points of the test objects are not specified in the experiment, but are within the period described by the experiment, then one speaks of type III. Observations are censored if one does not know the end point or the event has not yet occurred at the last known time.

Left censored and interval censored data

  • If the event has already occurred at an unknown point in time in the past, we speak of left-censored data .
  • If the event occurs unobserved between two points in time a and b , one speaks of interval-censored data.

Non-informative censoring

Noninformative censoring (also called random censoring) is when every patient has a censoring time that is statistically independent of their survival time. The observed value is the minimum of the censoring and survival times . Patients whose survival time is longer than their censoring time are right-censored.

Examples and Applications

A simple example in a questionnaire is the question of age. If the exact age is not queried below or above a certain age, but only "younger than ... years" or "older than ... years", we speak of censored data.

Censoring be used, for example, when the time of occurrence to be observed of a certain event ( ger .: time-to-event ) because the event in question may already have occurred, for example, before the start of observation or until the end of the experiment has not yet occurred is.

Handling of censored data

To be able to draw conclusions from a sample with censored data, there are basically two options:

  • Missing values: the records are omitted and treated as missing values
  • Estimation: the data on the event that was not observed is estimated , usually by regression on the observed values.
  • A special procedure for censored data is the Tobit model .

See also

literature

  • Elisa T. Lee, John Wenyu Wang: Statistical Methods for Survival Data Analysis. 3rd edition, John Wiley & Sons, 2003, ISBN 0-471-36997-7 .

Individual evidence

  1. ^ L. Bradley: Smallpox Inoculation: An Eighteenth Century Mathematical Controversy. Nottingham 1971.