Data Mining_ Concepts and Techniques - Jiawei Han [200]
■ Semantic annotations can be generated to help users understand the meaning of the frequent patterns found, such as for textual terms like “{frequent, pattern}.” These are dictionary-like annotations, providing semantic information relating to the term. This information consists of context indicators (e.g., terms indicating the context of that pattern), the most representative data transactions (e.g., fragments or sentences containing the term), and the most semantically similar patterns (e.g., “{maximal, pattern}” is semantically similar to “{frequent, pattern}”). The annotations provide a view of the pattern's context from different angles, which aids in their understanding.
■ Frequent pattern mining has many diverse applications, ranging from pattern-based data cleaning to pattern-based classification, clustering, and outlier or exception analysis. These methods are discussed in the subsequent chapters in this book.
7.8. Exercises
7.1 Propose and outline a level-shared mining approach to mining multilevel association rules in which each item is encoded by its level position. Design it so that an initial scan of the database collects the count for each item at each concept level, identifying frequent and subfrequent items. Comment on the processing cost of mining multilevel associations with this method in comparison to mining single-level associations.
7.2 Suppose, as manager of a chain of stores, you would like to use sales transactional data to analyze the effectiveness of your store's advertisements. In particular, you would like to study how specific factors influence the effectiveness of advertisements that announce a particular category of items on sale. The factors to study are the region in which customers live and the day-of-the-week and time-of-the-day of the ads. Discuss how to design an efficient method to mine the transaction data sets and explain how multidimensional and multilevel mining methods can help you derive a good solution.
7.3 Quantitative association rules may disclose exceptional behaviors within a data set, where “exceptional” can be defined based on statistical theory. For example, Section 7.2.3 shows the association rule
which suggests an exceptional pattern. The rule states that the average wage for females is only $7.90 per hour, which is a significantly lower wage than the overall average of $9.02 per hour. Discuss how such quantitative rules can be discovered systematically and efficiently in large data sets with quantitative attributes.
7.4 In multidimensional data analysis, it is interesting to extract pairs of similar cell characteristics associated with substantial changes in measure in a data cube, where cells are considered similar if they are related by roll-up (i.e, ancestors), drill-down (i.e, descendants), or 1-D mutation (i.e, siblings) operations. Such an analysis is called cube gradient analysis.
Suppose the measure of the cube is average. A user poses a set of probe cells and would like to find their corresponding sets of gradient cells, each of which satisfies a certain gradient threshold. For example, find the set of corresponding gradient cells that have an average sale price greater than 20% of that of the given probe cells. Develop an algorithm than mines the set of constrained gradient cells efficiently in a large data cube.
7.5 Section 7.2.4 presented various ways of defining negatively correlated patterns. Consider Definition 7.3: “Suppose that itemsets X and Y are both frequent, that is, and , where min_sup is the minimum support threshold. If , where ϵ is a negative pattern threshold, then pattern X ∪ Y is a negatively correlated pattern.” Design an efficient pattern growth algorithm for mining the set of negatively correlated patterns.
7.6 Prove that each entry in the following table correctly characterizes its corresponding rule constraint