CHAID

CHAID ( Ch i-square A utomatic I nteraction D etectors ) is an algorithm which is used for decision making. It is used in the construction of decision trees .

The CHAID algorithm was first published in 1964 by JA Sonquist and JN Morgan and is therefore the oldest of the common decision tree algorithms. Anderberg (1973) describes it. JA Hartigan (1975) gives an implementation.

The main difference between CHAID and CART and C4.5 is that the CHAID algorithm stops the tree from growing before the tree has grown too big. The tree is therefore not allowed to grow at will and then pruned again using a pruning method. Another difference is that CHAID works with categorically scaled variables such as color (red, yellow, green) or rating (good, medium, bad) instead of metrically scaled variables such as height in cm.

The CHAID algorithm uses the chi-square independence test to select the attributes . CHAIDs are used when a statement has to be made about the dependency of two variables. For this purpose, a key figure, the chi-square distance, is calculated. The following applies: the larger this key figure, the greater the dependency of the variables under consideration . The variable with the greatest chi-square distance to the target variable is considered as the attribute selection. As with the C4.5 algorithm, more than two branches per node can be made to increase the separation quality. As a result, the trees generated are more compact than the CARTs. The same method is used to find the best subdivisions. Since all possible combinations of characteristics have to be evaluated in these decision trees, runtime problems can arise with large amounts of data. It is therefore an advantage if the numerical variables are converted into variables with categorical characteristics, although this means additional work. For this, the result should be qualitatively better.

literature

Sonquist, JA and Morgan, JN (1964): The Detection of Interaction Effects . Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor.
Anderberg, MR (1973): Cluster Analysis for Applications . New York - Academic Press.
Hartigan, JA (1975): Clustering Algorithms . New York - Wiley.

CHAID

See also

literature