Data Mining - Mehmed Kantardzic [137]
This idea is proven theoretically by Hansen and company through the statement: If N classifiers make independent errors and they have the error probability e < 0.5, then it can be shown that the error of an ensemble E is monotonically decreasing the function of N. Clearly, performances quickly decrease for dependent classifiers.
8.1 ENSEMBLE-LEARNING METHODOLOGIES
The ensemble-learning methodology consists of two sequential phases: (a) the training phase, and (b) the testing phase. In the training phase, the ensemble method generates several different predictive models from training samples as presented in Figure 8.2a. For predicting an unknown value of a test sample, the ensemble method aggregates outputs of each predictive model (Fig. 8.2b). An integrated predictive model generated by an ensemble approach consists of several predictive models (Predictive model.1, Predictive model.2, … , Predictive model. n) and a combining rule as shown in Figure 8.2b. We will refer to such a predictive model as an ensemble. The field of ensemble learning is still relatively new, and several names are used as synonyms depending on which predictive task is performed, including combination of multiple classifiers, classifier fusion, mixture of experts, or consensus aggregation.
Figure 8.2. Training phase and testing phase for building an ensemble. (a) Training phase; (b) testing phase.
To perform better than a single predictive model, an ensemble should consist of predictive models that are independent of each other, that is, their errors are uncorrelated, and each of them has an accuracy rate of >0.5. The outcome of each predictive model is aggregated to determine the output value of a test sample. We may analyze all steps of ensemble prediction for a classification task. For example, we may analyze a classification task where the ensemble consists of 15 classifiers, each of which classifies test samples into one of two categorical values. The ensemble decides the categorical value based on the dominant frequency of classifiers’ outputs. If 15 predictive models are different from each other, and each model has the identical error rate (ε = 0.3), the ensemble will make a wrong prediction only if more than half of the predictive models misclassify a test sample. Therefore, the error rate of the ensemble is
which is considerably lower than the 0.3 error rate of a single classifier. The sum is starting with eight, and it means that eight or more models misclassified a test sample, while seven or fewer models classified the sample correctly.
Figure 8.3a shows the error rates of an ensemble, which consists of 15 predictive models (n = 15). The x-axis represents an error rate (ε) of a single classifier. The diagonal line represents the case in which all models in the ensemble are identical. The solid line represents error rates of an ensemble in which predictive models are different and independent from each other. An ensemble has a significantly lower error rate than a single predictive model only when the error rate (ε) of the members of the ensemble is lower than 0.5.
Figure 8.3. Changes in error rates of an ensemble. (a) Identical predictive models versus different predictive models in an ensemble; (b) the different number of predictive models in an ensemble.
We can also analyze the effect of the number of predictive models in an ensemble. Figure 8.3b shows error-rate curves for ensembles that consist of 5, 15, 25, and 35 predictive models, respectively. Observe that when an error