Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [19]

By Root 722 0
Chapter 4 is on statistical learning theory and the different types of learning methods and learning tasks that may be derived from the theory. Also, problems of evaluation and deployment of developed models is discussed in this chapter.

Chapters 5 to 11 give an overview of common classes of data-mining techniques. Predictive methods are described in Chapters 5 to 8, while descriptive data mining is given in Chapters 9 to 11. Selected statistical inference methods are presented in Chapter 5, including Bayesian classifier, predictive and logistic regression, analysis of variance (ANOVA), and log-linear models. Chapter 6 summarizes the basic characteristics of the C4.5 algorithm as a representative of logic-based techniques for classification problems. Basic characteristics of the Classification and Regression Trees (CART) approach are also introduced and compared with C4.5 methodology. Chapter 7 discusses the basic components of artificial neural networks and introduces two classes: multilayer perceptrons and competitive networks as illustrative representatives of a neural-network technology. Practical applications of a data-mining technology show that the use of several models in predictive data mining increases the quality of results. This approach is called ensemble learning, and basic principles are given in Chapter 8.

Chapter 9 explains the complexity of clustering problems and introduces agglomerative, partitional, and incremental clustering techniques. Different aspects of local modeling in large data sets are addressed in Chapter 10, and common techniques of association-rule mining are presented. Web mining and text mining are becoming one of the central topics for many researchers, and results of these activities are new algorithms summarized in Chapter 11. There are a number of new topics and recent trends in data mining that are emphasized in the last 7 years. Some of these topics, such as graph mining, and temporal, spatial, and distributed data mining, are covered in Chapter 12. Important legal restrictions and guidelines, and security and privacy aspects of data mining applications are also introduced in this chapter. Most of the techniques explained in Chapters 13 and 14, about genetic algorithms and fuzzy systems, are not directly applicable in mining large data sets. Recent advances in the field show that these technologies, derived from soft computing, are becoming more important in better representing and computing data as they are combined with other techniques. Finally, Chapter 15 recognizes the importance of data-mining visualization techniques, especially those for representation of large-dimensional samples.

It is our hope that we have succeeded in producing an informative and readable text supplemented with relevant examples and illustrations. All chapters in the book have a set of review problems and reading lists. The author is preparing a solutions manual for instructors who might use the book for undergraduate or graduate classes. For an in-depth understanding of the various topics covered in this book, we recommend to the reader a fairly comprehensive list of references, given at the end of each chapter. Although most of these references are from various journals, magazines, and conference and workshop proceedings, it is obvious that, as data mining is becoming a more mature field, there are many more books available, covering different aspects of data mining and knowledge discovery. Finally, the book has two appendices with useful background information for practical applications of data-mining technology. In Appendix A we provide an overview of the most influential journals, conferences, forums, and blogs, as well as a list of commercially and publicly available data-mining tools, while Appendix B presents a number of commercially successful data-mining applications.

The reader should have some knowledge of the basic concepts and terminology associated with data structures and databases. In addition, some background in elementary statistics and machine learning may also be useful, but it

Return Main Page Previous Page Next Page

®Online Book Reader