Assessment quality for cardinal insolvency prognoses

from Wikipedia, the free encyclopedia

While ordinal insolvency forecasts merely rank companies according to the expected probability of default , cardinal insolvency forecasts explicitly assign each company a probability of default.

Fundamental criteria for evaluating bankruptcy forecasts

Since default probabilities can also be interpreted as a ranking criterion , cardinal insolvency prognoses can be evaluated with regard to all quality criteria that are also applicable for ordinal insolvency prognoses:

  • Resolution measures the extent to which the realized default rates are differentiated in relation to the different rating classes . The minimum resolution is given when the same realized default rates are recorded for all rating classes. The maximum resolution is given if 0% or 100% default rates occur in the individual rating classes,
  • Discrimination measures the extent to which the forecasts differ between companies that have actually failed / actually not failed.

In addition, criteria can also be checked for which the ex-ante specification of failure probabilities is absolutely necessary:

  • For groups of forecasts (rating classes), calibration measures how well the forecasted default probabilities match the realized default rates,
  • Systematic bias (unconditional bias) : Indicates how much the average predicted default probability differs from the actual default rate,
  • Refinement measures how highly differentiated the failure forecasts are. Minimal delicacy is given when an identical probability of failure is always forecast; maximum fineness is given when only 0% or 100% forecasts are given.

Key figures that are simultaneously determined by all or some of these properties of cardinal insolvency forecasts are referred to below as measures for the precision (accuracy) of a procedure. Key figures that relate the precision of a forecasting method to the precision of a specific reference method are referred to as measures of relative precision (also skill scores or relative accuracy ).

Metrics to measure the calibration of bankruptcy forecasts

Key figures that measure only some of the aspects of cardinal insolvency prognoses listed above, especially the aspect of calibration , are, for example, the grouped Brier score or the Rommelfanger index.

Grouped Brier score

The Grouped Brier Score is defined as follows

Formula 1 :

with forecast / realized failure rate for rating class i ,
g : number of rating classes

Note: An obvious alternative to the equal weighting of the rating class-specific squared differences of the forecast and realized default rates when determining the score is to take into account the relative occupancy levels of the individual rating classes:

Formula 1b :

with a i : Share of companies in rating class i in all companies

Despite the similar structure, the grouped Brier score and the Brier score , which is presented below, differ fundamentally. In contrast to the Brier score (see below), the grouped Brier score is only influenced by the quality of the calibration of a rating procedure - but not by all other criteria of cardinal appraisal measures.

Rommelfanger index

The Rommelfanger index is defined as follows:

Formula 2 :

with for i = 1 ... g -1, or for i = g ,
: relative volume of all credits in the validation / learning sample,
: "Suitable weight"

Note: No statement is made about how the “suitable weights” must be designed. Other points of criticism of this parameter, in addition to the exclusive focus on the aspect of calibration, are the dependence on irrelevant variables (structure of the learning sample) and the setting of incentives for systematic incorrect prognosis: because in classes 1 ... g -1 only too high and in In class g, if the probability of default is too low, there is an incentive to systematically set all forecasts too high (rating class 1 ... g -1) or too low (rating class g ).

Further key figures for measuring the calibration of bankruptcy forecasts

Further parameters that only check the correctness of the calibration of individual or all rating classes are test statistics of the binomial test , the χ 2 test or the normal distribution test .

Universal estimate of quality for cardinal insolvency prognoses

Basic structure of universal estimation quality measures for cardinal insolvency prognoses

The two precision measures of cardinal insolvency prognoses presented below are based on a uniform basic principle: they compare the individual forecasted default probabilities with the realized default results (with if debtor i defaulted / not defaulted) and prove the differences that arise with different "penalties". In this way, they are influenced by all of the fundamental criteria for evaluating failure forecasts listed above - and not just by some of these measures.

In contrast to categorical insolvency forecasting procedures , which only use the extreme forecasts of “default” vs. Allowing “non-failure” is initially questionable in the case of stochastic failure forecasts (cardinal failure forecasts), why deviations in the individual forecasts (failure probabilities) and failure realization should be “punished” as errors. Finally, the forecasts can assume any values ​​between 0% and 100%, while the failure realizations can only assume the extreme values ​​1 ("failure") or 0 ("non-failure"). Even if the predicted failure probabilities are “correct”, i. H. are correctly calibrated, for example if 5% of all companies fail where the procedure has predicted a failure probability of 5% and 10% of all companies fail where the procedure has predicted a failure probability of 10% etc., the procedures will be "penalized." ", D. H. do not get the best possible expression. In these cases, however, the imperfect selectivity of the procedures is “punished” : a procedure that would have predicted an insolvency probability of 1.35% for all German companies in 2003 would have been perfectly calibrated, but would have a high “penalty” for receive its non-selective forecasts. A procedure, on the other hand, that would have predicted an insolvency probability of 100% for 1.35% of these companies and a default probability of 0% for the remaining 98.65% and which would have always been correct with these forecasts would have received the best possible rating.

The conditional information entropy and the Brier score are two common precision measures for evaluating cardinal insolvency prognoses, which differ only in terms of the specific characteristics of their “penal functions”.

Conditional information entropy

The conditional information entropy (CIE) is based on a logarithmic “penalty function”. The entropy sets from the thermodynamics are borrowed concept which is to measure the degree of disorder of a system. In the context of insolvency prognoses , the conditional information entropy is intended to quantify the degree of uncertainty that is associated with the probability of default distribution of a portfolio of companies determined with a rating model .

Formula 3 :

with n : number of debtors
Note: CIE is not defined only for those cases in which a failure occurs although it has been definitely excluded ( and ) or in which no failure occurs although it has been predicted with certainty ( and ).

Formula 4 :

in the case of g discrete rating classes the following results:

Formula 4b :

with the share of companies in rating class i in all companies

Formula 5 :

Formula 6 :

with CIER: conditional information ratio
and : CIE value of a "naive" reference prognosis, which always prognosticates the probability PD
Note: The term is also known as the Kullback-Leibler distance or wealth growth rate pickup . The term CIER corresponds to the parameter McFadden's-r 2, which is usually used to measure the goodness of fit of logistic regression estimates .

Brier score

In contrast to the conditional information entropy (CIE), the Brier score is based on a quadratic function, with which deviations of the forecast failure probabilities from the failure realizations are “punished”. It is defined as follows:

Formula 7 :

Formula 8 :

in the case of g discrete rating classes this corresponds to:

Formula 8b :

Formula 9 : with

Formula 10 :

Note: In the notation used in the context of regression analyzes, BS naively corresponds to the sum divided by n of the absolute variation of the variable to be explained or the total sum of squares  (TSS). Skill BS = (TSS - RSS) / TSS (with RSS: residual square sum ) therefore applies . Skill BS = r 2 , with r 2 : coefficient of determination ("regression-r 2 ") and r 2 = ESS / TSS and ESS = TSS-RSS.

The “penalty functions” of the conditional information entropy and the Brier score are to be viewed as arbitrary in the sense that they do not refer to the ultimately interesting (and possibly differing) utility variables of the users of the forecasting method. However, the parameters show a "plausible" behavior, so that a correlation with the utility values ​​of the potential users of the forecasts can at least be assumed: Both scores "reward" correctly calibrated and separable forecasts - and by transforming the resulting scores, references to the other quality criteria for cardinal insolvency prognoses, such as resolution, delicacy, systematic bias.

Decomposition of the Brier score into the components variance, calibration and resolution

Environmental dependency of cardinal appraisal measures

From the decomposition of the Brier score shown in the figure above, however, one problematic property of the Brier score (and other cardinal quality measures) becomes apparent: the dependence on the average failure rate of the population. The greater the variance of the environment (PD * (1-PD)), the greater (= worse) the Brier score that a method achieves in the respective environment. In order to avoid this undesirable environmental dependency of cardinal quality measures, the use of skill measures is proposed which consider the determined quality value in relation to the quality of naive prognoses in the same environment.

This dependency is undesirable because it impairs the performance comparison of different methods if the performance of the methods is measured on populations with different average failure frequencies. Empirically and (model) theoretically, however, it can be shown that skill scores are also environment-dependent - while the Brier score (for PD i <50%) becomes more and more "worse" with increasing failure probabilities, the associated skill scores, paradoxically, always " better". Quality measures for ordinal insolvency forecasts do not have this disadvantage.

The above quality measures are occasionally also used under the assumption of correct calibration, i.e. H. ex-post, PD i , prog = PD i , tat is set for all i . The formulas 4b and 8b then simplify to:

Formula 4c :

Formula 8c :

Formula 8d :

The quality measures obtained in this way are then insensitive to possible incorrect calibrations (or even missing calibrations, as in the case of ordinal insolvency forecasts) - the middle term ("calibration") in the above figure is omitted - and they thus only measure the variance of the environment and the resolution of the forecasts . However, they are unsuitable for cross-portfolio comparison because they are dependent on the average default rate. When comparing different forecasting methods on the basis of identical portfolios, they are not more informative than the usual estimation quality measures for ordinal insolvency forecasts such as area under the ROC curve and accuracy ratio , but they can be used as an additional criterion, especially in the case of intersecting ROC curves . If, when comparing two forecasting methods directly, all indicators point to the superiority of the same method, the decision-maker - if he decides in favor of this method - is reassured that he has chosen the correct method. If the various indicators give contradicting signals, it can be assumed that the decision-maker does not at least make a “big mistake” if he decides in favor of either of the two methods. Alternatively, he can then also use secondary decision criteria such as the costs of forecasting or the transparency and comprehensibility of the forecasting process.

literature

Individual evidence

  1. This article is based on Bemmann (2005).
  2. See Murphy, Winkler (1992, p. 440) for the formal definitions of the quality criteria presented below: resolution, discrimination, calibration, refinement, unconditional bias, accuracy and skill .
  3. The parameters Accuracy Ratio and Area under the ROC curve treated under Estimation Quality Measures for Ordinal Insolvency Forecasts are also influenced by the resolution and separability of the forecasts.
  4. see Murphy, Winkler (1992, p. 440)
  5. ^ See for example Frerichs, Wahrenburg (2003, p. 16, own notation). In a simulation study, the authors found that the grouped Brier score is not suitable as a validation parameter for rating systems, since it is not able to reliably identify “inferior” rating systems.
  6. DVFA (2004, p. 600, own notation)
  7. see DVFA (2004, p. 599)
  8. ^ Studies on the Validation of Internal Rating Systems . (PDF) Working Paper No. 14, October 24, 2005, revised version, 05/2005, Basel Committee on Banking Supervision , p. 47 ff.
  9. see also Krämer (2003, p. 396f.)
  10. See Sobehart, Keenan, Stein (2000, p. 14). See Shannon (1948, p. 11f.) For an axiomatic justification for the use of logarithmic “penalty functions” - however, in the case of corporate insolvencies, the last of these axioms is “default” vs. “Non-failure” does not make sense to apply. See also Matheson, Winkler (1976), Keenan, Sobehart (1999, p. 9), and Studies on the Validation of Internal Rating Systems . (PDF) Working Paper No. 14, October 24, 2005, revised version, 05/2005, Basel Committee on Banking Supervision , p. 44, for Formula F 27 (own notation).
  11. see Krämer, Güttler (2003, p. 12)
  12. see Keenan, Sobehart (1999, p. 10)
  13. Sobehart, Keenan, Stein (2000, p. 14): “The CIER compares the amount of 'uncertainty' regarding default in the case where we have no model (a state of more uncertainty about the possible outcomes) to the amount of ' uncertainty 'left over after we have introduced a model (presumably, a state of less ignorance). "
  14. In view of the very volatile failure rates in companies over time, the forecast of future PD is by no means trivial. See for example Keenan (1999) or S&P Quarterly Default Update & Rating Transitions . Standard and Poor’s . The McGraw Hills Companies, 10/2004, p. 3.
  15. see Basel Committee (2005, p. 30)
  16. see Cangemi, Servigny, Friedman (2003, p. 40)
  17. see Scheule (2003, p. 51)
  18. For the definition of the Brier score see Brier (1950, p. 1), Murphy, Winkler (1992, p. 439, Formula 7), Krämer, Güttler (2003, p. 11), Frerichs, Wahrenburg (2003, p . 14), Rating Models and Validation , Series of Guidelines on Credit Risk. (PDF) January 2, 2016, Austrian National Bank , Vienna 2004, p. 123 ff., Grunert, Norden, Weber (2005, p. 517)
  19. see Gujarati (1999, p. 170ff.)
  20. This statement is not trivial. If, for example , the penalty function is used, the expected penalties for and for are lower than , see Bemmann (2005, Appendix II). See ibid for proof of the incentive compatibility of the Brier score and the conditional information entropy. Brier (1950, p. 2) already cites incentive compatibility as an advantage of the Brier score.
  21. Both measures of estimated quality achieve their most favorable values ​​if a method always predicts failure probabilities of 0% or 100% and the forecasts also come true.
  22. see Murphy, Winkler (1992)
  23. see Bemmann (2005, Appendix III)
  24. see Krämer (2003, p. 406) or Winkler (1994, p. 1397): “The development of so called 'skill-scores' has been motivated by the desire to produce average scores that reflect the relative ability of forecaster rather than some combination of the forecaster's ability to and the situation's difficulty. These skill scores attempt to neutralize the contribution of the situation by comparing a forecaster's average score to the average score that an unsophisticated forecasting scheme would have obtained for the same set of forecasting situations. "
  25. see Winkler (1994, pp. 1401f.) And Bemmann (2005, Appendix III)
  26. see Bemmann (2003, Appendix)
  27. see Krämer, Güttler (2003, p. 12)