Data Mining_ Concepts and Techniques - Jiawei Han [186]
count(S) ≤ v yes no weakly
count(S) ≥ v no yes weakly
sum(S) ≤ v (∀a ∈ S, a ≥ 0) yes no no
sum(S) ≥ v (∀a ∈ S, a ≥ 0) no yes no
range(S) ≤ v yes no no
range(S) ≥ v no yes no
avg(S) θ v, θ ∈ {≤, ≥} convertible convertible no
support(S) ≥ ξ yes no no
support(S) ≤ ξ no yes no
all_confidence(S) ≥ ξ yes no no
all_confidence(S) ≤ ξ no yes no
The second category of constraints is monotonic. If the rule constraint in Example 7.8 were “,” the constraint-based processing method would be quite different. If an itemset I satisfies the constraint, that is, the sum of the prices in the set is no less than $100, further addition of more items to I will increase cost and will always satisfy the constraint. Therefore, further testing of this constraint on itemset I becomes redundant. In other words, if an itemset satisfies this rule constraint, so do all of its supersets. If a rule constraint obeys this property, it is monotonic. Similar rule monotonic constraints include “,” “,” and so on. The monotonicity of the list of SQL primitives–based constraints is indicated in the third column of Table 7.2.
The third category is succinct constraints. For this constraints category, we can enumerate all and only those sets that are guaranteed to satisfy the constraint. That is, if a rule constraint is succinct, we can directly generate precisely the sets that satisfy it, even before support counting begins. This avoids the substantial overhead of the generate-and-test paradigm. In other words, such constraints are precounting prunable. For example, the constraint “” in Example 7.8 is succinct because we can explicitly and precisely generate all the itemsets that satisfy the constraint.
Specifically, such a set must consist of a nonempty set of items that have a price no less than $50. It is of the form S, where S ≠ ∅ is a subset of the set of all items with prices no less than $50. Because there is a precise “formula” for generating all the sets satisfying a succinct constraint, there is no need to iteratively check the rule constraint during the mining process. The succinctness of the list of SQL primitives–based constraints is indicated in the fourth column of Table 7.2. 2
2For constraint (and similarly for ), we can have a member generation function based on a cardinality constraint (i.e., ). Member generation in this manner is of a different flavor and thus is called weakly succinct.
The fourth category is convertible constraints. Some constraints belong to none of the previous three categories. However, if the items in the itemset are arranged in a particular order, the constraint may become monotonic or antimonotonic with regard to the frequent itemset mining process. For example, the constraint “” is neither antimonotonic nor monotonic. However, if items in a transaction are added to an itemset in price-ascending order, the constraint becomes antimonotonic, because if an itemset I violates the constraint (i.e., with an average price greater than $10), then further addition of more expensive items into the itemset will never make it satisfy the constraint. Similarly, if items in a transaction are added to an itemset in price-descending order, it becomes monotonic, because if the itemset satisfies the constraint (i.e., with an average price no greater than $10), then adding cheaper items into the current itemset will still make the average price no greater than $10. Aside from “” and “,” given in Table 7.2, there are many other convertible constraints such as “” “,” and so on.
Note that the previous discussion does not imply that every constraint is convertible. For example, “,” where θ ∈ {≤, ≥} and each element in S could be of any real value, is not convertible. Therefore, there is yet a fifth category of constraints, called inconvertible constraints. The good news is that although there still exist some tough constraints that are not convertible, most simple SQL expressions with built-in SQL aggregates belong to one of the first four categories to which efficient constraint mining methods can be applied.
Pruning