Data Mining - Mehmed Kantardzic [158]
Han, J., M. Kamber, Data Mining: Concepts and Techniques, 2nd edition, Morgan Kaufmann, San Francisco, CA, 2006.
This book gives a sound understanding of data-mining principles. The primary orientation of the book is for database practitioners and professionals with emphasis on OLAP and data warehousing. In-depth analysis of association rules and clustering algorithms is the additional strength of the book. All algorithms are presented in easily understood pseudo-code and they are suitable for use in real-world, large-scale data-mining projects including advanced applications such as Web mining and text mining.
Hand, D., H. Mannila, P. Smith, Principles of Data Mining, MIT Press, Cambridge, MA, 2001.
The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data-mining algorithms and their applications. The second section, data-mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The third section shows how all of the preceding analyses fit together when applied to real-world data-mining problems.
Jain, A. K., M. N. Murty, P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, Vol. 31, No. 3, September 1999, pp. 264–323.
Although there are several excellent books on clustering algorithms, this review paper will give the reader enough details about the state-of-the-art techniques in data clustering, with an emphasis on large data sets problems. The paper presents the taxonomy of clustering techniques and identifies crosscutting themes, recent advances, and some important applications. For readers interested in practical implementation of some clustering methods, the paper offers useful advice and a large spectrum of references.
Miyamoto, S., Fuzzy Sets in Information Retrieval and Cluster Analysis, Cluver Academic Publishers, Dodrecht, Germany, 1990.
This book offers an in-depth presentation and analysis of some clustering algorithms and reviews the possibilities of combining these techniques with fuzzy representation of data. Information retrieval, which, with the development of advanced Web-mining techniques, is becoming more important in the data-mining community, is also explained in the book.
10
ASSOCIATION RULES
Chapter Objectives
Explain the local modeling character of association-rule techniques.
Analyze the basic characteristics of large transactional databases.
Describe the Apriori algorithm and explain all its phases through illustrative examples.
Compare the frequent pattern (FP) growth method with the Apriori algorithm.
Outline the solution for association-rule generation from frequent itemsets.
Explain the discovery of multidimensional associations.
Introduce the extension of FP growth methodology for classification problems.
When we talk about machine-learning methods applied to data mining, we classify them as either parametric or nonparametric methods. In parametric methods used for density estimation, classification, or regression, we assume that a final model is valid over the entire input space. In regression, for example, when we derive a linear model, we apply it for all future inputs. In classification, we assume that all samples (training, but also new, testing) are drawn from the same density distribution. Models in these cases are global models valid for the entire n-dimensional space of samples. The advantage