Data Mining - Mehmed Kantardzic [110]
6.4 PRUNING DECISION TREES
Discarding one or more subtrees and replacing them with leaves simplify a decision tree, and that is the main task in decision-tree pruning. In replacing the subtree with a leaf, the algorithm expects to lower the predicted error rate and increase the quality of a classification model. But computation of error rate is not simple. An error rate based only on a training data set does not provide a suitable estimate. One possibility to estimate the predicted error rate is to use a new, additional set of test samples if they are available, or to use the cross-validation techniques explained in Chapter 4. This technique divides initially available samples into equal-sized blocks and, for each block, the tree is constructed from all samples except this block and tested with a given block of samples. With the available training and testing samples, the basic idea of decision tree pruning is to remove parts of the tree (subtrees) that do not contribute to the classification accuracy of unseen testing samples, producing a less complex and thus more comprehensible tree. There are two ways in which the recursive-partitioning method can be modified:
1. Deciding Not to Divide a Set of Samples Any Further under Some Conditions. The stopping criterion is usually based on some statistical tests, such as the χ2 test: If there are no significant differences in classification accuracy before and after division, then represent a current node as a leaf. The decision is made in advance, before splitting, and therefore this approach is called prepruning.
2. Removing Retrospectively Some of the Tree Structure Using Selected Accuracy Criteria. The decision in this process of postpruning is made after the tree has been built.
C4.5 follows the postpruning approach, but it uses a specific technique to estimate the predicted error rate. This method is called pessimistic pruning. For every node in a tree, the estimation of the upper confidence limit Ucf is computed using the statistical tables for binomial distribution (given in most textbooks on statistics). Parameter Ucf is a function of |Ti| and E for a given node. C4.5 uses the default confidence level of 25%, and compares U25% (|Ti|/E) for a given node Ti with a weighted confidence of its leaves. Weights are the total number of cases for every leaf. If the predicted error for a root node in a subtree is less than weighted sum of U25% for the leaves (predicted error for the subtree), then a subtree will be replaced with its root node, which becomes a new leaf in a pruned tree.
Let us illustrate this procedure with one simple example. A subtree of a decision tree is given in Figure 6.9, where the root node is the test x1 on three possible values {1, 2, 3} of the attribute A. The children of the root node are leaves denoted with corresponding classes and (|Ti|/E) parameters. The question is to estimate the possibility of pruning the subtree and replacing it with its root node as a new, generalized leaf node.
Figure 6.9. Pruning a subtree by replacing it with one leaf node.
To analyze the possibility of replacing the subtree with a leaf node it is necessary to compute