Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining - Mehmed Kantardzic [139]

By Root 918 0

for movie classification.

Methodologies based on different training subsets of input samples (d) are the most popular approaches in ensemble learning, and corresponding techniques such as bagging and boosting are widely applied in different tools. But, before the detailed explanations of these techniques, it is necessary to explain one additional and final step in ensemble learning, and that is combining of outcomes for different learners.

8.2 COMBINATION SCHEMES FOR MULTIPLE LEARNERS

Combination schemes include:

1. Global approach is through learners’ fusion where all learners produce an output and these outputs are combined by voting, averaging, or stacking. This represents integration (fusion) functions where for each pattern, all the classifiers contribute to the final decision.

2. Local approach is based on learner selection where one or more learners responsible for generating the output are selected based on their closeness to the sample. Selection function is applied where for each pattern, just one classifier, or a subset, is responsible for the final decision.

3. Multistage combination uses a serial approach where the next learner is trained with or tested on only instances where previous learners were inaccurate.

Voting is the simplest way of combining classifiers on a global level, and representing the result as a linear combination of outputs dj for n learners:

The result of the combination could be different depending on wj. Alternatives for combinations are simple sum (equal weights), weighted sum, median, minimum, maximum, and product of dij. Voting schemes can be seen as approximations under a Bayesian framework where weights wj approximate prior model probabilities.

Rank-level Fusion Method is applied for some classifiers that provide class “scores,” or some sort of class probabilities. In general, if Ω = {c1, … , ck} is the set of classes, each of these classifiers can provide an “ordered” (ranked) list of class labels. For example, if probabilities of output classes are 0.10, 0.75, and 0.20, corresponding ranks for the classes will be 1, 3, and 2, respectively. The highest rank is given to the class with the highest probability. Let us check an example, where the number of classifiers is N = 3, and the number of classes k = 4, Ω = {a, b, c, d}. For a given sample, the ranked outputs of the three classifiers are as follows:

In this case, final selection of the output class will be determined by accumulation of scores for each class:

The winner class is b because it has the maximum overall rank.

Finally, the Dynamic Classifier Selection (DCS) algorithm, representing a local approach, assumes the following steps:

1. Find the k nearest training samples to the test input.

2. Look at the accuracies of the base classifiers on these samples.

3. Choose one (or top N) classifiers that performs best on these samples.

4. Combine decisions for selected classifiers.

8.3 BAGGING AND BOOSTING

Bagging and boosting are well-known procedures with solid theoretical background. They belong to the class (d) of ensemble methodologies and essentially they are based on resampling of a training data set.

Bagging, a name derived from bootstrap aggregation, was the first effective method of ensemble learning and is one of the simplest methods. It was originally designed for classification and is usually applied to decision tree models, but it can be used with any type of model for classification or regression. The method uses multiple versions of a training set by using the bootstrap, that is, sampling with replacement. Each of these data sets is used to train a different model. The outputs of the models are combined by averaging (in the case of regression) or voting (in the case of classification) to create a single output.

In the bagging methodology a training data set for a predictive model consists of samples taken with replacement from an initial set of samples according to a sampling distribution. The sampling distribution determines how likely it is that a sample will be selected. For

Online Book Reader

Data Mining - Mehmed Kantardzic [139]

®Online Book Reader