Data Mining_ Concepts and Techniques - Jiawei Han [185]
“How can we use rule constraints to prune the search space? More specifically, what kind of rule constraints can be ‘pushed’ deep into the mining process and still ensure the completeness of the answer returned for a mining query?”
In general, an efficient frequent pattern mining processor can prune its search space during mining in two major ways: pruning pattern search space and pruning data search space. The former checks candidate patterns and decides whether a pattern can be pruned. Applying the Apriori property, it prunes a pattern if no superpattern of it can be generated in the remaining mining process. The latter checks the data set to determine whether the particular data piece will be able to contribute to the subsequent generation of satisfiable patterns (for a particular pattern) in the remaining mining process. If not, the data piece is pruned from further exploration. A constraint that may facilitate pattern space pruning is called a pattern pruning constraint, whereas one that can be used for data space pruning is called a data pruning constraint.
Pruning Pattern Space with Pattern Pruning Constraints
Based on how a constraint may interact with the pattern mining process, there are five categories of pattern mining constraints: (1) antimonotonic, (2) monotonic, (3) succinct, (4) convertible, and (5) inconvertible. For each category, we use an example to show its characteristics and explain how such kinds of constraints can be used in the mining process.
The first category of constraints is antimonotonic. Consider the rule constraint “” of Example 7.8. Suppose we are using the Apriori framework, which explores itemsets of size k at the kth iteration. If the price summation of the items in a candidate itemset is no less than $100, this itemset can be pruned from the search space, since adding more items into the set (assuming price is no less than zero) will only make it more expensive and thus will never satisfy the constraint. In other words, if an itemset does not satisfy this rule constraint, none of its supersets can satisfy the constraint. If a rule constraint obeys this property, it is antimonotonic. Pruning by antimonotonic constraints can be applied at each iteration of Apriori-style algorithms to help improve the efficiency of the overall mining process while guaranteeing completeness of the data mining task.
The Apriori property, which states that all nonempty subsets of a frequent itemset must also be frequent, is antimonotonic. If a given itemset does not satisfy minimum support, none of its supersets can. This property is used at each iteration of the Apriori algorithm to reduce the number of candidate itemsets examined, thereby reducing the search space for association rules.
Other examples of antimonotonic constraints include “,” “,” and so on. Any itemset that violates either of these constraints can be discarded since adding more items to such itemsets can never satisfy the constraints. Note that a constraint such as “” is not antimonotonic. For a given itemset that does not satisfy this constraint, a superset created by adding some (cheap) items may result in satisfying the constraint. Hence, pushing this constraint inside the mining process will not guarantee completeness of the data mining task. A list of SQL primitives–based constraints is given in the first column of Table 7.2. The antimonotonicity of the constraints is indicated in the second column. To simplify our discussion, only existence operators (e.g., =, ∈, but not ≠, ∉) and comparison (or containment) operators with equality (e.g., ≤, ⊆) are given.
Table 7.2 Characterization of Commonly Used SQL-Based Pattern Pruning Constraints
ConstraintAntimonotonicMonotonicSuccinct
v ∈ S no yes yes
S ⊇ V no yes yes
S ⊆ V yes no yes
min(S) ≤ v no yes yes
min(S) ≥ v yes no yes
max(S) ≤ v yes no yes
max(S) ≥ v