Data Mining_ Concepts and Techniques - Jiawei Han [178]
Notice that the Apriori property may not always hold uniformly across all of the items when mining under reduced support and group-based support. However, efficient methods can be developed based on the extension of the property. The details are left as an exercise for interested readers.
A serious side effect of mining multilevel association rules is its generation of many redundant rules across multiple abstraction levels due to the “ancestor” relationships among items. For example, consider the following rules where “laptop computer” is an ancestor of “Dell laptop computer” based on the concept hierarchy of Figure 7.2 and where X is a variable representing customers who purchased items in AllElectronics transactions.
(7.4)
(7.5)
“If Rules(7.4) and (7.5)are both mined, then how useful is Rule(7.5)? Does it really provide any novel information?” If the latter, less general rule does not provide new information, then it should be removed. Let's look at how this may be determined. A rule R1 is an ancestor of a rule R2, if R1 can be obtained by replacing the items in R2 by their ancestors in a concept hierarchy. For example, Rule (7.4) is an ancestor of Rule (7.5) because “laptop computer” is an ancestor of “Dell laptop computer.” Based on this definition, a rule can be considered redundant if its support and confidence are close to their “expected” values, based on an ancestor of the rule.
Checking redundancy among multilevel association rules
Suppose that Rule (7.4) has a 70% confidence and 8% support, and that about one-quarter of all “laptop computer” sales are for “Dell laptop computers.” We may expect Rule (7.5) to have a confidence of around 70% (since all data samples of “Dell laptop computer” are also samples of “laptop computer”) and a support of around 2% (i.e., ). If this is indeed the case, then Rule (7.5) is not interesting because it does not offer any additional information and is less general than Rule (7.4).
7.2.2. Mining Multidimensional Associations
So far, we have studied association rules that imply a single predicate, that is, the predicate buys. For instance, in mining our AllElectronics database, we may discover the Boolean association rule
(7.6)
Following the terminology used in multidimensional databases, we refer to each distinct predicate in a rule as a dimension. Hence, we can refer to Rule (7.6) as a single-dimensional or intradimensional association rule because it contains a single distinct predicate (e.g., buys) with multiple occurrences (i.e., the predicate occurs more than once within the rule). Such rules are commonly mined from transactional data.
Instead of considering transactional data only, sales and related information are often linked with relational data or integrated into a data warehouse. Such data stores are multidimensional in nature. For instance, in addition to keeping track of the items purchased in sales transactions, a relational database may record other attributes associated with the items and/or transactions such as the item description or the branch location of the sale. Additional relational information regarding the customers who purchased the items (e.g., customer age, occupation, credit rating, income, and address) may also be stored. Considering each database attribute or warehouse dimension as a predicate, we can therefore mine association rules containing multiple predicates such as
(7.7)
Association rules that involve two or more dimensions or predicates can be referred to as multidimensional association rules. Rule