Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining - Mehmed Kantardzic [111]

By Root 751 0

a predicted error PE for the initial tree and for a replaced node. Using default confidence of 25%, the upper confidence limits for all nodes are collected from statistical tables: U25%(6,0) = 0.206, U25%(9,0) = 0.143, U25%(1,0) = 0.750, and U25%(16,1) = 0.157. Using these values, the predicted errors for the initial tree and the replaced node are

Since the existing subtree has a higher value of predicted error than the replaced node, it is recommended that the decision tree be pruned and the subtree replaced with the new leaf node.

6.5 C4.5 ALGORITHM: GENERATING DECISION RULES

Even though the pruned trees are more compact than the originals, they can still be very complex. Large decision trees are difficult to understand because each node has a specific context established by the outcomes of tests at antecedent nodes. To make a decision-tree model more readable, a path to each leaf can be transformed into an IF-THEN production rule. The IF part consists of all tests on a path, and the THEN part is a final classification. Rules in this form are called decision rules, and a collection of decision rules for all leaf nodes would classify samples exactly as the tree does. As a consequence of their tree origin, the IF parts of the rules would be mutually exclusive and exhaustive, so the order of the rules would not matter. An example of the transformation of a decision tree into a set of decision rules is given in Figure 6.10, where the two given attributes, A and B, may have two possible values, 1 and 2, and the final classification is into one of two classes.

Figure 6.10. Transformation of a decision tree into decision rules. (a) Decision tree; (b) decision rules.

For our trained decision tree in Figure 6.8, the corresponding decision rules will be

If Attribute1 = A and Attribute2 < = 70

Then Classification = CLASS1 (2.0/0);

If Attribute1 = A and Attribute2 > 70

Then Classification = CLASS2 (3.4/0.4);

If Attribute1 = B

Then Classification = CLASS1 (3.2/0);

If Attribute1 = C and Attribute3 = True

Then Classification = CLASS2 (2.4/0);

If Attribute1 = C and Attribute3 = False

Then Classification = CLASS1 (3.0/0).

Rewriting the tree to a collection of rules, one for each leaf in the tree, would not result in a simplified model. The number of decision rules in the classification model can be extremely large and pruning of rules can improve readability of the model. In some cases, the antecedents of individual rules may contain irrelevant conditions. The rules can be generalized by deleting these superfluous conditions without affecting rule-set accuracy. What are criteria for deletion of rule conditions? Let rule R be

If A then Class C

and a more general rule R’ could be

If A’ then Class C

where A’ is obtained by deleting one condition X from A (A = A’∪X). The evidence for the importance of condition X must be found in the training samples. Each sample in the database that satisfies the condition A’ either satisfies or does not satisfy the extended conditions A. Also, each of these cases does or does not belong to the designated Class C. The results can be organized into a contingency 2 × 2 table:

Class C Other Classes

Satisfies condition X Y1 E1

Does not satisfy condition X Y2 E2

There are Y1 + E1 cases that are covered by the original rule R, where R misclassifies E1 of them since they belong to classes other than C. Similarly, Y1 + Y2 + E1 + E2 is the total number of cases covered by rule R’, and E1 + E2 are errors. The criterion for the elimination of condition X from the rule is based on a pessimistic estimate of the accuracy of rules R and R’. The estimate of the error rate of rule R can be set to Ucf(Y1 + E1, E1), and for that of rule R’ to Ucf(Y1 + Y2 + E1 + E2, E1 + E2). If the pessimistic error rate of rule R’ is no greater than that of the original rule R, then it makes sense to delete condition X. Of course, more than one condition may have to be deleted when a rule is generalized. Rather than looking at all possible subsets of conditions that could be deleted, the C4.5 system performs

Online Book Reader

Data Mining - Mehmed Kantardzic [111]

®Online Book Reader