Data Mining_ Concepts and Techniques - Jiawei Han [201]
Rule ConstraintAntimonotonicMonotonicSuccinct
(a) v ∈ S no yes yes
(b) S ⊆ V yes no yes
(c) min(S) ≤ v no yes yes
(d) range(S) ≤ v yes no no
(e) variance(S) ≤ v convertible convertible no
7.7 The price of each item in a store is non-negative. The store manager is only interested in rules of certain forms, using the constraints given in (a)–(b). For each of the following cases, identify the kinds of constraints they represent and briefly discuss how to mine such association rules using constraint-based pattern mining.
(a) Containing at least one Blu-ray DVD movie.
(b) Containing items with a sum of the prices that is less than $150.
(c) Containing one free item and other items with a sum of the prices that is at least $200.
(d) Where the average price of all the items is between $100 and $500.
7.8 Section 7.4.1 introduced a core Pattern-Fusion method for mining high-dimensional data. Explain why a long pattern, if one exists in the data set, is likely to be discovered by this method.
7.9 Section 7.5.1 defined a pattern distance measure between closed patterns P1 and P2 as
where T(P1) and T(P2) are the supporting transaction sets of P1 and P2, respectively. Is this a valid distance metric? Show the derivation to support your answer.
7.10 Association rule mining often generates a large number of rules, many of which may be similar, thus not containing much novel information. Design an efficient algorithm that compresses a large set of patterns into a small compact set. Discuss whether your mining method is robust under different pattern similarity definitions.
7.11 Frequent pattern mining may generate many superfluous patterns. Therefore, it is important to develop methods that mine compressed patterns. Suppose a user would like to obtain only k patterns (where k is a small integer). Outline an efficient method that generates the k most representative patterns, where more distinct patterns are preferred over very similar patterns. Illustrate the effectiveness of your method using a small data set.
7.12 It is interesting to generate semantic annotations for mined patterns. Section 7.6.1 presented a pattern annotation method. Alternative methods are possible, such as by utilizing type information. In the DBLP data set, for example, authors, conferences, terms, and papers form multi-typed data. Develop a method for automated semantic pattern annotation that makes good use of typed information.
7.9. Bibliographic Notes
This chapter described various ways in which the basic techniques of frequent itemset mining (presented in Chapter 6) have been extended. One line of extension is mining multilevel and multidimensional association rules. Multilevel association mining was studied in Srikant and Agrawal [SA95] and Han and Fu [HF95]. In Srikant and Agrawal [SA95], such mining was studied in the context of generalized association rules, and an R-interest measure was proposed for removing redundant rules. Mining multidimensional association rules using static discretization of quantitative attributes and data cubes was studied by Kamber, Han, and Chiang [KHC97].
Another line of extension is to mine patterns on numeric attributes. Srikant and Agrawal [SA96] proposed a nongrid-based technique for mining quantitative association rules, which uses a measure of partial completeness. Mining quantitative association rules based on rule clustering was proposed by Lent, Swami, and Widom [LSW97]. Techniques for mining quantitative rules based on x-monotone and rectilinear regions were presented by Fukuda, Morimoto, Morishita, and Tokuyama [FMMT96] and Yoda, Fukuda, Morimoto, et al. [YFM+97]. Mining (distance-based) association rules over interval data was proposed by Miller and Yang [MY97]. Aumann and Lindell [AL99] studied the mining of quantitative association rules based on a statistical theory to present only those rules that deviate substantially from normal data.
Mining rare patterns by pushing group-based constraints was proposed by Wang, He, and Han [WHH00]. Mining negative association