Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [115]

By Root 784 0
trees. In a multivariate tree, at a decision node, all input dimensions can be used for testing (e.g., w1x1 + w2x2 + w0 > 0 as presented in Fig. 6.12). It is a hyperplane with an arbitrary orientation. This is 2d (Nd) possible hyperplanes and exhaustive search is not practical.

Figure 6.12. Multivariate decision node.

With linear multivariate nodes, we can use hyperplanes for better approximation using fewer nodes. A disadvantage of the technique is that multivariate nodes are mode difficult to interpret. Also, more complex nodes require more data. The earliest version of multivariate trees is implemented in CART algorithm, which fine-tunes the weights wi one by one to decrease impurity. CART also has a preprocessing stage to decrease dimensionality through subset input selection (and therefore reduction of node complexity).

6.7 LIMITATIONS OF DECISION TREES AND DECISION RULES


Decision rule- and decision tree-based models are relatively simple and readable, and their generation is very fast. Unlike many statistical approaches, a logical approach does not depend on assumptions about distribution of attribute values or independence of attributes. Also, this method tends to be more robust across tasks than most other statistical methods. But there are also some disadvantages and limitations of a logical approach, and a data-mining analyst has to be aware of it because the selection of an appropriate methodology is a key step to the success of a data-mining process.

If data samples are represented graphically in an n-dimensional space, where n is the number of attributes, then a logical classifier (decision trees or decision rules) divides the space into regions. Each region is labeled with a corresponding class. An unseen testing sample is then classified by determining the region into which the given point falls. Decision trees are constructed by successive refinement, splitting existing regions into smaller ones that contain highly concentrated points of one class. The number of training cases needed to construct a good classifier is proportional to the number of regions. More complex classifications require more regions that are described with more rules and a tree with higher complexity. All that will require an additional number of training samples to obtain a successful classification.

A graphical representation of decision rules is given by orthogonal hyperplanes in an n-dimensional space. The regions for classification are hyperrectangles in the same space. If the problem at hand is such that the classification hyperplanes are not orthogonal, but are defined through a linear (or nonlinear) combination of attributes, such as the example in Figure 6.13, then that increases the complexity of a rule-based model. A logical approach based on decision rules tries to approximate non-orthogonal, and sometimes, nonlinear classification with hyperrectangles; classification becomes extremely complex with large number of rules and a still larger error.

Figure 6.13. Approximation of non-orthogonal classification with hyperrectangles.

A possible solution to this problem is an additional iteration of the data-mining process: Returning to the beginning of preprocessing phases, it is necessary to transform input features into new dimensions that are linear (or nonlinear) combinations of initial inputs. This transformation is based on some domain heuristics, and it requires emphasis with additional effort in data preparation; the reward is a simpler classification model with a lower error rate.

The other types of classification problems, where decision rules are not the appropriate tool for modeling, have classification criteria in the form: A given class is supported if n out of m conditions are present. To represent this classifier with rules, it would be necessary to define (mn) regions only for one class. Medical diagnostic decisions are a typical example of this kind of classification. If four out of 11 symptoms support diagnosis of a given disease, then the corresponding classifier will generate 330 regions in

Return Main Page Previous Page Next Page

®Online Book Reader