Data Mining - Mehmed Kantardzic [88]
Berthold, M., D. J. Hand, eds., Intelligent Data Analysis—An Introduction, Springer, Berlin, Germany, 1999.
The book is a detailed, introductory presentation of the key classes of intelligent data-analysis methods including all common data-mining techniques. The first half of the book is devoted to the discussion of classical statistical issues, ranging from basic concepts of probability and inference to advanced multivariate analyses and Bayesian methods. The second part of the book covers theoretical explanations of data-mining techniques that have their roots in disciplines other than statistics. Numerous illustrations and examples enhance the readers’ knowledge about theory and practical evaluations of data-mining techniques.
Cherkassky, V., F. Mulier, Learning from Data: Concepts, Theory and Methods, 2nd edition, John Wiley, New York, 2007.
The book provides a unified treatment of the principles and methods for learning dependencies from data. It establishes a general conceptual framework in which various learning methods from statistics, machine learning, and other disciplines can be applied—showing that a few fundamental principles underlie most new methods being proposed today. An additional strength of this primarily theoretical book is the large number of case studies and examples that simplify and make understandable concepts in SLT.
Engel, A., C. Van den Broeck, Statistical Mechanics of Learning, Cambridge University Press, Cambridge, UK, 2001.
The subject of this book is the contribution of machine learning over the last decade by researchers applying the techniques of statistical mechanics. The authors provide a coherent account of various important concepts and techniques that are currently only found scattered in papers. They include many examples and exercises, making this a book that can be used with courses, or for self-teaching, or as a handy reference.
Haykin, S., Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1999.
The book provides a comprehensive foundation for the study of artificial neural networks, recognizing the multidisciplinary nature of the subject. The introductory part explains the basic principles of SLT and the concept of VC dimension. The main part of the book classifies and explains artificial neural networks as learning machines with and without a teacher. The material presented in the book is supported with a large number of examples, problems, and computer-oriented experiments.
5
STATISTICAL METHODS
Chapter Objectives
Explain methods of statistical inference commonly used in data-mining applications.
Identify different statistical parameters for assessing differences in data sets.
Describe the components and the basic principles of Naïve Bayesian classifier and the logistic regression method.
Introduce log-linear models using correspondence analysis of contingency tables.
Discuss the concepts of analysis of variance (ANOVA) and linear discriminant analysis (LDA) of multidimensional samples.
Statistics is the science of collecting and organizing data and drawing conclusions from data sets. The organization and description of the general characteristics of data sets is the subject area of descriptive statistics. How to draw conclusions from data is the subject of statistical inference. In