Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [138]

By Root 861 0
rate of a predictive model is lower than 0.5, the larger the number of predictive models is, the lower the error rate of an ensemble is. For example, when each predictive model of an ensemble has an error rate of 0.4, error rates of each ensemble (n = 5, n = 15, n = 25, and n = 35) are calculated as 0.317, 0.213, 0.153, and 0.114, respectively. However, this decrease in the error rate for an ensemble is becoming less significant if the number of classifiers is very large, or when the error rate of each classifier becomes relatively small.

The basic questions in creating an ensemble learner are as follows: How to generate base learners, and how to combine the outputs from base learners? Diverse and independent learners can be generated by

(a) using different learning algorithms for different learning models such as support vector machines, decision trees, and neural networks;

(b) using different hyper-parameters in the same algorithm to tune different models (e.g., different numbers of hidden nodes in artificial neural networks);

(c) using different input representations, such as using different subsets of input features in a data set; or

(d) using different training subsets of input data to generate different models usually using the same learning methodology.

Stacked Generalization (or stacking) is a methodology that could be classified in the first group (a). Unlike other well-known techniques, stacking may be (and normally is) used to combine models of different types. One way of combining multiple models is specified by introducing the concept of a meta-learner. The learning procedure is as follows:

1. Split the training set into two disjoint sets.

2. Train several base learners on the first part.

3. Test the base learners on the second part.

4. Using the predictions from (3) as the inputs, and the correct responses as the outputs, train a higher level learner.

Note that steps (1) to (3) are the same as cross-validation, but instead of using a winner-take-all approach, the base learners are combined, possibly nonlinearly. Although an attractive idea, it is less theoretically analyzed and less widely used than bagging and boosting, the two most recognized ensemble-learning methodologies. Similar situation is with the second group of methodologies (b): Although a very simple approach, it is not used or analyzed intensively. Maybe the main reason is that applying the same methodology with different parameters does not guarantee independence of models.

Class (c) methodologies are based on manual or automatic feature selection/extraction that can be used for generating diverse classifiers using different feature sets. For example, subsets related to different sensors, or subsets of features computed with different algorithms, may be used. To form training data sets, different subsets of input features are chosen, and then each training sample with the selected input features becomes an element of training data sets. In Figure 8.4, there are five training samples {S1, S2, S3, S4, S5} with four features {F1, F2, F3, F4}. When the training data set 1 is generated, three features {F1, F2, F4} is randomly selected from input features {F1, F2, F3, F4}, and all training samples with those features form the first training set. Similar process is performed for the other training sets. The main requirement is that classifiers use different subsets of features that are complementary.

Figure 8.4. Feature selection for ensemble classifiers methodology.

The random subspace method (RSM) is a relatively recent method of ensemble learning that is based on the theory of stochastic discrimination. Learning machines are trained on randomly chosen subspaces of the original input space and the outputs of the models are then combined. Illustrative example for the classification of movies is given in Figure 8.5. RSM works well for large feature sets with redundant features. Random forest methodology, which utilizes such an approach, is implemented in many commercial data-mining tools.

Figure 8.5. RSM approach in ensemble classifier

Return Main Page Previous Page Next Page

®Online Book Reader