Incidence (epidemiology)

from Wikipedia, the free encyclopedia

In epidemiology and medical statistics , incidence (from the Latin incidere , `` to happen, to happen '' ; incidence case = incident) describes the frequency of events - especially newly emerging cases of illness - within a period of time . In the simplest case, the incidence of a disease in a population is shown as the number of new cases that occur per 100,000 people in one year. Precisely defined measures of the incidence are the cumulative incidence , the incidence density and the incidence ratewhich often do not differ significantly in their numerical value. The incidence of deaths is also called mortality . In addition to the prevalence (the proportion of sick people in a population), the incidence is a measure of the morbidity in a population. Although the following is described using humans as an example, the incidence is also a useful quantity for monitoring animal populations.

Incidence Measures

Cumulative Incidence

The cumulative incidence ( English cumulative incidence , therefore often abbreviated with CI ), also the proportion of incidence of a disease in a population, indicates the proportion of people who develop the disease at least once in a defined period of time . It can also be interpreted in a person- related manner as the probability with which a person from the considered population will fall ill with the considered disease at least once in the defined period of time. The cumulative incidence is therefore also called the incidence risk or simply risk .

Here are

  • : the number of people (individuals) who become newly ill within the period , and
  • : the number of healthy persons at the start of the observation.

The cumulative incidence is a proportion and accordingly has a value between 0 and 1; it has no unit (especially not the unit per year ) and is therefore dimensionless . Specifying a cumulative incidence without specifying a period is pointless, as the cumulative incidence increases over time. The cumulative incidence is close to zero for very short periods of time, regardless of the disease, and tends towards 1 with increasing observation time. If no period is given, a period of one year is usually meant. This time span has the advantage that seasonal fluctuations largely average out.

Union and intersection of cumulative incidences

If you want to summarize cumulative incidences for consecutive periods, they cannot simply be added. Instead, the counter-probabilities (1 - incidence, corresponding to the probabilities of surviving the partial periods of time in good health) must be multiplied in order to obtain the counter-probability for the cumulative incidence over the entire period.

Example:

In a group of 200 smoking men aged 60 to 80 who have not yet had a heart attack, 22 people had their first heart attacks during an observation period of two years (12 people in the first year of observation and 10 people in the second year of observation).

The cumulative incidence of heart attacks in this group is therefore 22/200 = 11% in two years. In the first year it is 12/200 = 6%, in the second year it is 10/188 = 5.3%. The following applies: (1 - 6%) × (1 - 5.3%) = 1 - 11%.

In the same way, the cumulative incidence for the occurrence of event A or event B can be calculated if the events are stochastically independent . The cumulative incidence for the occurrence of event A and event B is equal to the product of both cumulative incidences, given stochastic independence. If A and B are not stochastically independent, the rules for conditional probabilities apply . For example, the likelihood of having a heart attack or stroke is smaller than one would assume based on the product of both counter-probabilities, because heart attack and stroke have similar risk factors and thus more than randomly meet the same people.

Determination in studies

Cumulative incidences can be determined in cross-sectional studies by asking study participants if they have had a specific disease in the past year. The proportion of people who answer “yes” is the cumulative incidence. It should be noted that many diseases occur disproportionately often in the last year of life, but this is often no longer covered by the question described. The cumulative incidence of fatal diseases would therefore be systematically underestimated ( English bias ) and should therefore not be determined in this way. In addition, the information provided by the respondents can systematically deviate from the truth.

Cumulative incidences are therefore best determined in prospective studies ( cohort studies ). For this purpose, people who have not yet been ill are recruited who are supposed to be representative of the population of those who are not ill . The undiseased population is also referred to as the population at risk because only this part of the population is still at risk of developing a new disease . The study participants are examined at the beginning to see whether they really do not have the disease under consideration. In the process, they will be tested again for the presence of the disease, at least at the end of the study period or until the disease is first detected. For logistical reasons, the study periods for the individual study participants usually do not all start on the same date, but in the order in which they were recruited.

Handling of censored data

For the calculation of the cumulative incidence of the above formula each study participant has over the entire observation period tracked are. In practice, however, it cannot be prevented that individual study participants die or leave the study for other reasons before they become ill or the study period ends. Such alternative causes of failure are also competing risks ( english competing risks mentioned). By the loss of follow-up ( English lost to follow-up ) arise censored data . If the lost study participants were simply excluded from the calculation, the cumulative incidence would be overestimated, because the longer a study participant has remained healthy, the lower the risk of becoming ill within the observation period. If one were to assume that the lost study participants would stay healthy or still get sick, the cumulative incidence would be underestimated or overestimated.

One possible solution to this methodological problem is the Kaplan-Meier estimator . Its use assumes that the points in time at which the study participants fall ill or leave the study are recorded, and that the likelihood of elimination is independent of the likelihood of illness. The Kaplan-Meier estimator estimates the survival function . Survival is to be understood here in terms of survival time analysis in general as the non-occurrence of an event. If the observed event means new illness, the Kaplan-Meier estimator estimates the proportion of those who are still healthy for each point in time within the study period. The difference between this proportion and one is nothing more than the cumulative incidence. The study participants can deliberately be tracked for different lengths of time, for example the first recruited the longest, provided that the first recruited does not systematically differ from the later recruited.

Infestation rate

The prevalence ( English attack rate ) is - suggests as her name differently - no rate , but a cumulative incidence. It describes the proportion of the population that develops the disease as part of an epidemic . During epidemics, certain populations are exposed to a certain risk of infection for a limited period of time. For example, the flu infestation rate is the percentage of the population that gets the flu during a flu season.

Incidence density

The incidence density (abbreviated with I , in English also force of morbidity , German about disease strength ) is a measure of the "urge to spread" of a disease. The definition is based on the so-called risk time . The risk time is defined as the time in which an individual from the population is healthy, i.e. is subject to the risk of the disease. The individual risk times are added to the so-called person time at risk of the population under consideration in the period under consideration. The incidence density therefore does not make any demands on the observation period of the study participants; it can track one person for two years as well as two people for a year. In the case of animals, one would not speak of person time, but of stock time . The incidence density is defined as the number of cases divided by the time spent by the person at risk :

The result of such a calculation is a number between 0 and ∞ per day / per week / per x years, whereby the unit used is mathematically interchangeable and says nothing about the study design. In contrast to the cumulative incidence, the incidence density is independent of the length of the observation period, provided that the dynamics of the diseases do not change. It is a relationship number and therefore cannot be interpreted as a probability.

In contrast to the cumulative incidence, multiple illnesses by the same person in the study period are included in the calculation several times. People who were already ill at the beginning of the observation can be included in the examination, as they can contribute more personal time at risk or even new diseases after recovery. In the case of illnesses whose first occurrence makes later illnesses of the same person much more likely (e.g. heart attack, stroke) or less likely (immunity following an infectious disease), it is advisable to examine only the first illness; People then drop out of observation when they become ill for the first time, as they can no longer spend any time at risk of an initial illness.

Example:

In City X, 29 heart attacks occurred among men aged 40–44 (41,532 person-years) in 1973. The incidence density was I = 29/41532 = 0.00071 / year.

The number of incidence cases divided by the person time at risk gives an average incidence density or estimates an incidence density assumed to be constant. If, on the other hand, the incidence density is assumed to be variable, instantaneous values ​​can be calculated by deriving the number of incidence cases according to the time spent at risk. In the context of survival analysis the current incidence density ( english instantaneous incidence density ) as the hazard rate referred.

The reciprocal of the incidence density is the average time at risk . This is at the same time the average time between two illnesses and the time that a currently healthy individual can expect to become ill.

Analogous to the incidence of diseases, an incidence of recoveries can be defined. The recovery density (also recurrence rate or recurrence density ) characterizes the speed with which the sick return to the (healthy) population at risk or die.

The reciprocal of the recovery poems gives the average duration of illness :

Incidence rate

Outside of clinical studies, only incidence cases can usually be counted. Since these of course depend on the population size, it makes sense to relate them to the population size. It must be taken into account that emigration and immigration, deaths and births occur in the population under consideration within the observation period. Dividing the number of incidence cases by the mean population size in the observation period gives the incidence rate . This is often given per 1,000 or 100,000 people.

The incidence rate is closely related to the incidence density and is equated with this by some authors. The reason for this is that the product of the temporal mean of the population size and the length of the observation period, neglecting the time spent sick, is equal to the person's time at risk.

The unprocessed incidence rate of the total population is also called the crude rate or crude incidence . The raw rate can be used to determine whether one area has a higher rate of disease than another. If raw rates differ between two regions, the differences can often be traced back to differences in the social structure. For example, high cancer incidence rates can simply be an expression of an age distribution that has shifted to an older age. If one would like to deduct these differences in order to identify variable risk factors, subgroups can be compared (e.g. formed according to gender, occupation, place of residence or age). Alternatively, the incidence cases can be weighted so that an incidence rate results for a theoretical population with a defined social structure. With an age-standardized incidence rate ( English age-adjusted incidence rate ) z can. B. compare a region with many old people with a region with many young people.

Other names for the incidence rate that is always related to a certain period (often a year), are new infection rate , incidence rate , access rate , event rate .

mortality

The mortality (mortality) is a special case of incidence. In this case, deaths and not illnesses are counted as target events. You can choose to look at all deaths (raw mortality) or only deaths due to certain diseases; in addition, the population under consideration can of course be narrowed down. Analogous to the incidence, there is also a cumulative mortality , a mortality density and a mortality rate . Since death is irreversible and can occur at any time, any lifetime is at risk.

Considerations at constant incidence density

A constant incidence density is assumed throughout this section. The relationships shown can therefore only be transferred to situations in which the assumption of a constant incidence density is justified. This is usually the case with chronic diseases, but not with seasonal disease or epidemics.

Relationship between incidence density and prevalence

The incidence density and prevalence are related to one another over the average duration of the disease . The relationship derived here applies under the assumption that the population size , the incidence density and the recovery density are constant. Then a steady state arises in which the number of sick people , the number of healthy people and thus also the proportion of sick people in the total population, the prevalence , are constant. The steady state consists in the fact that, on average over time, the frequency of illnesses is equal to the frequency of recovery (including the mortality of the sick):

 
 
 (1)
 

By forming the quotient and then reducing the fraction with , equation (1) gives

 
 (2)
 

Changing equation (2) results in

,
and
 
 (3)
 

If the conditions for a steady state are met, but the current prevalence does not correspond to equation (3), the prevalence automatically moves in the direction of steady-state. If the prevalence will decrease, if it increases.

Equation (3) also shows that a disease can achieve a high prevalence through both a high incidence density (e.g. highly contagious germ) and a low recovery density (e.g. chronic diseases ). Rapid healing or rapid death, on the other hand, lead to a low prevalence through a high recovery density. This shows that the seemingly restrictive assumption of a population equilibrium or a steady state in the context of a pandemic containment allows statements to be made about the success of health policy measures.

Equation (2) can also be represented as

 
 (4)
 

If the prevalence is very low (e.g. <1%), the denominator of equation (4) can be neglected. This results in the approximation

 
 (5)
 

The product of the incidence density and the average duration of illness can be interpreted as the “proportion of sick leave” in a population in relation to healthy people (e.g. per 100,000).

Statistical relationships

Distribution of the number of diseases

The number of illnesses during person time is Poisson distributed ( ) with the distribution parameter (event rate). With a Poisson distribution, the expected value (the expected number of diseases) and the variance are at the same time .

Distribution of person time until the first illness

The person's time until the first illness or between two illnesses is exponentially distributed ( ) with the distribution parameter . The expected value and standard deviation are thus .

Calculation of the cumulative incidence from the incidence density

By integrating the probability density function of the exponential distribution one obtains the distribution function of the exponential distribution ; For each point in time, this provides the probability with which an individual has become ill at least once. This is nothing other than the cumulative incidence CI , which can thus be calculated from the incidence density:

 
 (6)
 

Example:

With an incidence density of 0.008 / year, the probability of the disease arises within 3 years
.

Since the exponential function for small can be approximated by ( Taylor series ), the formula can be simplified if the product of the incidence density and the observation period is small (e.g. ). In this case the important relationship follows approximately

 
 (7)
 

The numerical values ​​of cumulative incidences (given as a percentage in one year) therefore hardly differ from the incidence densities (given as a number per year) for some diseases .

The product corresponds to the cumulative hazard function from the survival time analysis . If the incidence density is not constant, the cumulative hazard function can be determined by integrating the incidence density over time. Equation (6) can thus be generalized

 
 (8th)
 

Derived measures

Relative risk

The ratio of the cumulative incidence of exposed to unexposed cumulative incidence is called relative risk (the difference in this ratio to 1 relative risk reduction or -Increase ):

It shows the cumulative incidence among those exposed and the cumulative incidence among those not exposed. If the relative risk is greater than 1, the exposure increases the risk of disease. If the relative risk is less than 1, the exposure reduces the risk of illness. The more the relative risk differs from 1, the stronger the connection between disease and exposure and the more likely a causal effect can be assumed.

Incidence density ratio

Analogous to the relative risk, the quotient of two incidence densities / rates, the incidence densities / rates (ratio) or the relative rate (here rate instead of risk , since the numerator and denominator cannot be interpreted as probabilities) can be defined as:

It shows the incidence density / rate among the exposed and the incidence density / rate among the unexposed . As long as the relationship applies (equation (7) under #Calculating the cumulative incidence from the incidence density ), both measures have approximately the same value, so relative rates how relative risks can be interpreted.

Example:

If there are 10 cases in 2,935 person-years (341 cases per 100,000 PY) and 239 cases in 135,130 person-years (177 cases per 100,000 PJ), a comparison results in an incidence density ratio of IDV = 1.926. Since IDV> 1, a damaging effect is assumed here.

Competing Risks

In medical research, it is common for a patient to be exposed to different causes of failure, each cause being referred to as competing risk. This becomes a problem when the competing risks are correlated with one another : For example, a study aims to determine mortality from myocardial infarction . Mortality from myocardial infarction means the proportion of study participants who would die from myocardial infarction if there were no other reasons to withdraw from the study. In reality, some of the study participants die of other cardiovascular diseases . Since these participants would have had a heart attack with an above-average probability in the further course, the probability of failure is not independent of the probability of illness. With the elimination of patients with a high risk of myocardial infarction, the Kaplan-Meier estimator would underestimate the cumulative mortality from myocardial infarction.

Such circumstances are in models of competing risks ( english competing risk models treated). In the basic model for competing risks, every individual is initially in the initial state . The person remains in this state until a first event occurs. Usually there is one event of interest that is modeled by transitions in state and all other first event types are subsumed into the competing event state . For the sake of simplicity, we will focus here on two competing event states. The techniques can easily be generalized to more than two competing risks. When risks are competing, an individual's movements are tracked over time. The process of competing risk refers to the state in which an individual is at any point in time . Each individual starts at the beginning of the initial state : .

An individual remains in the state (i.e., ) as long as neither the competing event nor has occurred. The person changes to state when the event of interest occurs. Likewise, the individual moves into the state when the other competing event occurs first. At the time , the process of competing risks is either in state or in state ( ). The nature of the first event is often called the cause of failure.

Cumulative Incidence Function

Instead of the Kaplan-Meier estimator, the so-called cumulative incidence function is used . For this purpose one defines first so-called cause-specific hazard functions ( English cause-specific hazard function ), the "instantaneous hazard rate " for cause at the time representing all the other risks in the presence of:

.

Due to the cause-specific hazard function, the cumulative cause-specific hazard function is then given by:

.

Since the cause-specific hazard functions add up to the global cause hazard function ( English all-cause hazard function )

,

can be written for the cumulative global cause hazard function . Also referred to

the survival function in the initial state .

The cumulative incidence function is the expected proportion of people who will experience a specific competitive event over time:

,

where denotes the value of the survival function immediately before .

Estimation of the cumulative incidence function

Let the number of living and uncensored individuals be imminent and continue to be the number of cause deaths at the time . The cause-specific cumulative incidence function can then be consistently estimated by

,

where represents the global survival function. The ratio is an estimate of the cause-specific hazard function for cause at time . A mathematically favorable property of this estimator of the cumulative incidence function is that . This means that at any point in time the probabilities for all types of events are added together with the probability of no event, giving one.

Standard error

The following standard errors can be used to carry out significance tests and calculate confidence intervals with large sample sizes :

Cumulative Incidence Incidence density
Difference between two cumulative incidences / incidence densities
Natural logarithm of the ratio of two cumulative incidences / incidence densities. In order to calculate the limits of the 95% confidence interval , the point estimate must therefore be divided by exp (1.96 standard error) or multiplied by it.

literature

  • Kreienbrock, Pigeot, Ahrens: Epidemiological Methods . 5th edition. Springer Spectrum, Berlin / Heidelberg 2012, ISBN 978-3-8274-2333-7 , Chapter 2 Epidemiological measures .
  • Kenneth J. Rothman: Epidemiology - An Introduction . 2nd Edition. Oxford University Press, 2012, ISBN 978-0-19-975455-7 , Chapter 4, Measuring Disease Occurrence and Causal Effects .
  • Lothar Sachs , Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , pp. 197-201.

Individual evidence

  1. Cf. incidere in the Pons online dictionary.
  2. ^ Entry incidence case in Duden (online).
  3. a b Lothar Sachs , Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 197.
  4. a b c GBE glossary. In: Robert Koch Institute .
  5. ^ Matthias Egger, Oliver Razum et al .: Public health compact. Walter de Gruyter, (2017)
  6. Uwe Truyen, Peter Valentin-Weigand: Veterinary microbiology, infection and disease theory. Georg Thieme Verlag (2015).
  7. Kenneth J. Rothman: Epidemiology - An Introduction . 2nd Edition. Oxford University Press, 2012, ISBN 978-0-19-975455-7 , pp. 42 .
  8. Lothar Kreienbrock, Iris Pigeot and Wolfgang Ahrens: Epidemiological Methods. 5th edition. Springer Spectrum, Berlin / Heidelberg 2012, ISBN 978-0-19-975455-7 , p. 25.
  9. a b c d e f Lothar Sachs , Jürgen Hedderich: Applied statistics: Collection of methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 198.
  10. Wolfgang Kiehl : Infection protection and infection epidemiology. Technical terms - definitions - interpretations. Ed .: Robert Koch Institute, Berlin 2015, ISBN 978-3-89606-258-1 , p. 77, keyword incidence ; in the case of an infection rate, reference is made there to incidence and the incidence rate is treated under incidence.
  11. Lothar Kreienbrock, Iris Pigeot and Wolfgang Ahrens: Epidemiological Methods. 5th edition. Springer Spectrum, Berlin / Heidelberg 2012, ISBN 978-0-19-975455-7 , p. 29
  12. Lothar Kreienbrock, Iris Pigeot and Wolfgang Ahrens: Epidemiological Methods. 5th edition. Springer Spectrum, Berlin / Heidelberg 2012, ISBN 978-0-19-975455-7 , p. 27
  13. ^ Matthias Egger, Oliver Razum et al .: Public health compact. Walter de Gruyter, (2017). P. 30.
  14. Lothar Sachs , Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 693.
  15. Lothar Sachs , Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 201.
  16. Jump up Jan Beyersmann, Arthur Allignol, and Martin Schumacher : Competing risks and multistate models with R. Springer Science & Business Media (2012), p. 43.
  17. Jan Beyersmann, Arthur Allignol, and Martin Schumacher : Competing risks and multistate models with R. Springer Science & Business Media (2012), p. 44.
  18. ^ Germán Rodríguez: Cumulative Incidence. 2012. p. 230., p. 2
  19. ^ Allison, Paul D .: Allison, Paul D. Survival analysis using SAS: a practical guide. . Sas Institute, 2010. p. 230.
  20. Kenneth J. Rothman: Epidemiology - An Introduction . 2nd Edition. Oxford University Press, 2012, ISBN 978-0-19-975455-7 , pp. 165 ff .