Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [184]

By Root 1719 0
finding all of the association rules reflecting these relationships, you are interested only in determining which pairs of customer traits promote the sale of office software. A metarule can be used to specify this information describing the form of rules you are interested in finding. An example of such a metarule is

(7.11)

where P1 and P2 are predicate variables that are instantiated to attributes from the given database during the mining process, X is a variable representing a customer, and Y and W take on values of the attributes assigned to P1 and P2, respectively. Typically, a user will specify a list of attributes to be considered for instantiation with P1 and P2. Otherwise, a default set may be used.

In general, a metarule forms a hypothesis regarding the relationships that the user is interested in probing or confirming. The data mining system can then search for rules that match the given metarule. For instance, Rule (7.12) matches or complies with Metarule (7.11):

(7.12)


“How can metarules be used to guide the mining process?” Let's examine this problem closely. Suppose that we wish to mine interdimensional association rules such as in Example 7.7. A metarule is a rule template of the form

(7.13)

where Pi () and Qj () are either instantiated predicates or predicate variables. Let the number of predicates in the metarule be . To find interdimensional association rules satisfying the template,

■ We need to find all frequent p-predicate sets, Lp.

■ We must also have the support or count of the l-predicate subsets of Lp to compute the confidence of rules derived from Lp.

This is a typical case of mining multidimensional association rules. By extending such methods using the constraint-pushing techniques described in the following section, we can derive efficient methods for metarule-guided mining.

7.3.2. Constraint-Based Pattern Generation: Pruning Pattern Space and Pruning Data Space

Rule constraints specify expected set/subset relationships of the variables in the mined rules, constant initiation of variables, and constraints on aggregate functions and other forms of constraints. Users typically employ their knowledge of the application or data to specify rule constraints for the mining task. These rule constraints may be used together with, or as an alternative to, metarule-guided mining. In this section, we examine rule constraints as to how they can be used to make the mining process more efficient. Let's study an example where rule constraints are used to mine hybrid-dimensional association rules.

Constraints for mining association rules

Suppose that AllElectronics has a sales multidimensional database with the following interrelated relations:

■ item(item_ID, item_name, description, category, price)

■ sales(transaction_ID, day, month, year, store_ID, city)

■ trans_item(item_ID, transaction_ID)

Here, the item table contains attributes item_ID, item_name, description, category, and price; the sales table contains attributes transaction_ID day, month, year, store_ID, and city; and the two tables are linked via the foreign key attributes, item_ID and transaction_ID, in the table trans_item.

Suppose our association mining query is “Find the patterns or rules about the sales of which cheap items (where the sum of the prices is less than 10) may promote (i.e., appear in the same transaction) the sales of which expensive items (where the minimum price is 50), shown in the sales in Chicago in 2010.”

This query contains the following four constraints: (1) , where I represents the item_ID of a cheap item; (2) ), where J represents the item_ID of an expensive item; (3) ; and (4) , where T represents a transaction_ID. For conciseness, we do not show the mining query explicitly here; however, the constraints' context is clear from the mining query semantics.


Dimension/level constraints and interestingness constraints can be applied after mining to filter out discovered rules, although it is generally more efficient and less expensive to use them during mining to help prune the search

Return Main Page Previous Page Next Page

®Online Book Reader