Data Mining - Mehmed Kantardzic [116]
Finally, introducing new attributes rather than removing old ones can avoid the sometimes intensive fragmentation of the n-dimensional space by additional rules. Let us analyze a simple example. A classification problem is described by nine binary inputs {A1, A2, … , A9}, and the output class C is specified by the logical relation
The above expression can be rewritten in a conjunctive form:
and it will have 27 factors with only ∧ operations. Every one of these factors is a region in a 9-D space and corresponds to one rule. Taking into account regions for negative examples, there exist about 50 leaves in the decision tree (and the same number of rules) describing class C. If new attributes are introduced:
the description of class C will be simplified into the logical rule
It is possible to specify the correct classification using a decision tree with only four leaves. In a new three-dimensional space (B1, B2, B3) there will be only four decision regions. This kind of simplification via constructive induction (development of new attributes in the preprocessing phase) can be applied also in a case n-of-m attributes’ decision. If none of the previous transformations are found appropriate, the only way to deal with the increased fragmentation of an n-dimensional space is to bring more data to bear on the problem.
6.8 REVIEW QUESTIONS AND PROBLEMS
1. Explain the differences between the statistical and logical approaches in the construction of a classification model.
2. What are the new features of C4.5 algorithm comparing with original Quinlan’s ID3 algorithm for decision tree generation?
3. Given a data set X with 3-D categorical samples:
Construct a decision tree using the computation steps given in the C4.5 algorithm.
4. Given a training data set Y:
Find the best threshold (for the maximal gain) for AttributeA.
(a) Find the best threshold (for the maximal gain) for AttributeB.
(b) Find a decision tree for data set Y.
(c) If the testing set is:
What is the percentage of correct classifications using the decision tree developed in (c).
(d) Derive decision rules from the decision tree.
5. Use the C4.5 algorithm to build a decision tree for classifying the following objects:
6. Given a training data set Y* with missing values:
(a) Apply a modified C4.5 algorithm to construct a decision tree with the (Ti/E) parameters explained in Section 7.3.
(b) Analyze the possibility of pruning the decision tree obtained in (a).
(c) Generate decision rules for the solution in (a). Is it necessary to generate a default rule for this rule-based model?
7. Why is postpruning in C4.5 defined as pessimistic pruning?
8. Suppose that two decision rules are generated with C4.5:
Rule1: (X > 3) ∧ (Y ≥ 2) → Class1 (9.6/0.4)
Rule2: (X > 3) ∧ (Y < 2) → Class2 (2.4/2.0).
Analyze if it is possible to generalize these rules into one using confidence limit U25% for the binomial distribution.
9. Discuss the complexity of the algorithm for optimal splitting of numeric attributes into more than two intervals.
10. In real-world data-mining applications, a final model consists of extremely large number of decision rules. Discuss the potential actions and analyses you should perform to reduce the complexity of the model.
11. Search the Web to find the basic characteristics of publicly available or commercial software tools for generating decision rules and decision trees. Document the results of your search.
12. Consider a binary classification problem (output attribute value = {Low, High}) with the following set of input attributes and attribute values:
Air Conditioner = {Working, Broken}
Engine = {Good, Bad}
Mileage = {High, Medium, Low}
Rust = {Yes, No}
Suppose a rule-based classifier produces the following rule set:
Mileage = High −→ Value = Low
Mileage