Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [142]

By Root 895 0

(b) Bagging gives varying weights to training instances. Boosting gives equal weight to all training instances.

(c) Bagging does not take the performance of previously built models into account when building a new model. With boosting each new model is built based upon the results of previous models.

(d) With boosting, each model has an equal weight in the classification of new instances. With bagging, individual models are given varying weights.

8.6 REFERENCES FOR FURTHER STUDY


Brown, G., Ensemble Learning, in Encyclopedia of Machine Learning, C. Sammut and Webb G. I., eds., Springer Press, New York, 2010.

“Ensemble learning refers to the procedures employed to train multiple learning machines and combine their outputs, treating them as a committee of decision makers.” The principle is that the committee decision, with individual predictions combined appropriately, should have better overall accuracy, on average, than any individual committee member. Numerous empirical and theoretical studies have demonstrated that ensemble models very often attain higher accuracy than single models. Ensemble methods constitute some of the most robust and accurate learning algorithms of the past decade. A multitude of heuristics has been developed for randomizing the ensemble parameters to generate diverse models. It is arguable that this line of investigation is rather oversubscribed nowadays, and the more interesting research is now in methods for nonstandard data.

Kuncheva, L. I., Combining Pattern Classifiers: Methods and Algorithms, Wiley Press, Hoboken, NJ, 2004.

Covering pattern classification methods, Combining Classifiers: Methods and Algorithms focuses on the important and widely studied issue of combining several classifiers together in order to achieve an improved recognition performance. It is one of the first books to provide unified, coherent, and expansive coverage of the topic and as such will be welcomed by those involved in the area. With case studies that bring the text alive and demonstrate “real-world” applications it is destined to become an essential reading.

Dietterich, T. G., Ensemble Methods in Machine Learning, in Lecture Notes in Computer Science on Multiple Classifier Systems, Vol. 1857, Springer, Berlin, 2000.

Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

9

CLUSTER ANALYSIS

Chapter Objectives

Distinguish between different representations of clusters and different measures of similarities.

Compare the basic characteristics of agglomerative- and partitional-clustering algorithms.

Implement agglomerative algorithms using single-link or complete-link measures of similarity.

Derive the K-means method for partitional clustering and analysis of its complexity.

Explain the implementation of incremental-clustering algorithms and its advantages and disadvantages.

Introduce concepts of density clustering, and algorithms Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Balanced and Iterative Reducing and Clustering Using Hierarchies (BIRCH).

Discuss why validation of clustering results is a difficult problem.

Cluster analysis is a set of methodologies for automatic classification of samples into a number of groups using a measure of association so that the samples in one group are similar and samples belonging to different groups are not similar. The input for a system of cluster analysis is a set of samples and a measure of similarity (or dissimilarity) between two samples. The output from cluster analysis is a number of groups (clusters)

Return Main Page Previous Page Next Page

®Online Book Reader