Iterative dichotomisers 3

Iterative Dichotomiser 3 ( ID3 ) is an algorithm used for decision making. It is used in decision trees . The Australian researcher J. Ross Quinlan first published this algorithm in 1986 . ID3 was very influential in its early years. It is still used in some products today. ID3 is considered to be the predecessor of the C4.5 algorithm.

ID3 is used when many different attributes are important with a large amount of data and therefore a decision tree should be generated without large calculations. This usually creates simple decision trees. However, it cannot be guaranteed that no better trees would be possible.

The basic structure of ID3 is iterative . Entropies relating to the training set are calculated for each attribute that has not yet been used . The attribute with the highest information gain or the smallest entropy is selected and a new tree node is generated from it. The process terminates when all training instances have been classified, i. H. if a classification is assigned to each leaf node.

algorithm

When all elements from (data) belong to one class ${\ displaystyle T}$

Then

// create a sheet //

Construct a sheet with the class as an identifier

Otherwise

// recursively create a subtree //

Choose the most "informative" feature

{\ displaystyle x_ {i}}

Start: For_ all occurring values of the characteristic

{\ displaystyle x_ {i}}

Construct all subtrees recursively with the corresponding subsets as data

End: For_all

Construct a tree node with an identifier and append all subtrees created

{\ displaystyle x_ {i}}

End else

Selection of attributes (features)

For the formation of the sub-trees corresponding to each of the information theory , the most informative attribute selected.

Let be the set of training examples with their respective classification, the attribute to be checked from the set of available attributes, the set of possible attribute values of and the subset of for which the attribute takes the value . The information gain achieved by selecting the attribute is then calculated as the difference between the entropy of and the expected / average entropy of when fixing : ${\ displaystyle S}$ ${\ displaystyle a \ in A}$ ${\ displaystyle V (a)}$ ${\ displaystyle a}$ ${\ displaystyle S_ {v}}$ ${\ displaystyle S}$ ${\ displaystyle a}$ ${\ displaystyle v}$ ${\ displaystyle a}$ ${\ displaystyle S}$ ${\ displaystyle S}$ ${\ displaystyle a}$

${\ displaystyle G (S, a) = \ operatorname {Entropy} (S) - \ sum _ {v \ in V (a)} {\ dfrac {| S_ {v} |} {| S |}} \ operatorname {Entropy} (S_ {v})}$ .

Finally, one chooses an attribute with the greatest possible profit from the set as the next attribute. ${\ displaystyle \ lbrace a_ {next} \ in A | G (S, a_ {next}) = \ max _ {a \ in A} (G (S, a)) \ rbrace}$

This choice leads to a preference for attributes with many options and thus to a broad tree. To counteract this, normalization can be carried out using the number of options.

Web links

Implementation of the ID3 algorithm in R

Individual evidence

^ J. Ross Quinlan: Induction of Decision Trees . In: Machine Learning . tape 1 , no. 1 , March 1986, p. 81-106 , doi : 10.1007 / BF00116251 .

[1] J. Ross Quinlan: Induction of Decision Trees . In: Machine Learning . tape 1 , no. 1 , March 1986, p. 81-106 , doi : 10.1007 / BF00116251 .

Iterative dichotomisers 3

contents

algorithm

Selection of attributes (features)

See also

Web links

Individual evidence