Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [187]

By Root 1450 0
Data Space with Data Pruning Constraints

The second way of search space pruning in constraint-based frequent pattern mining is pruning data space. This strategy prunes pieces of data if they will not contribute to the subsequent generation of satisfiable patterns in the mining process. We consider two properties: data succinctness and data antimonotonicity.

Constraints are data-succinct if they can be used at the beginning of a pattern mining process to prune the data subsets that cannot satisfy the constraints. For example, if a mining query requires that the mined pattern must contain digital camera, then any transaction that does not contain digital camera can be pruned at the beginning of the mining process, which effectively reduces the data set to be examined.

Interestingly, many constraints are data-antimonotonic in the sense that during the mining process, if a data entry cannot satisfy a data-antimonotonic constraint based on the current pattern, then it can be pruned. We prune it because it will not be able to contribute to the generation of any superpattern of the current pattern in the remaining mining process.

Data antimonotonicity

A mining query requires that C1 : sum(I.price) ≥ $100, that is, the sum of the prices of the items in the mined pattern must be no less than $100. Suppose that the current frequent itemset, S, does not satisfy constraint C1 (say, because the sum of the prices of the items in S is $50). If the remaining frequent items in a transaction Ti are such that, say, {i2.price = $5, i5.price = $10, i8.price = $20}, then Ti will not be able to make S satisfy the constraint. Thus, Ti cannot contribute to the patterns to be mined from S, and thus can be pruned.

Note that such pruning cannot be done at the beginning of the mining because at that time, we do not know yet if the total sum of the prices of all the items in Ti will be over $100 (e.g., we may have i3.price = $80). However, during the iterative mining process, we may find some items (e.g., i3) that are not frequent with S in the transaction data set, and thus they would be pruned. Therefore, such checking and pruning should be enforced at each iteration to reduce the data search space.


Notice that constraint C1 is a monotonic constraint with respect to pattern space pruning. As we have seen, this constraint has very limited power for reducing the search space in pattern pruning. However, the same constraint can be used for effective reduction of the data search space.

For an antimonotonic constraint, such as , we can prune both pattern and data search spaces at the same time. Based on our study of pattern pruning, we already know that the current itemset can be pruned if the sum of the prices in it is over $100 (since its further expansion can never satisfy C2). At the same time, we can also prune any remaining items in a transaction Ti that cannot make the constraint C2 valid. For example, if the sum of the prices of items in the current itemset S is $90, any patterns over $10 in the remaining frequent items in Ti can be pruned. If none of the remaining items in Ti can make the constraint valid, the entire transaction Ti should be pruned.

Consider pattern constraints that are neither antimonotonic nor monotonic such as “.” These can be data-antimonotonic because if the remaining items in a transaction Ti cannot make the constraint valid, then Ti can be pruned as well. Therefore, data-antimonotonic constraints can be quite useful for constraint-based data space pruning.

Notice that search space pruning by data antimonotonicity is confined only to a pattern growth–based mining algorithm because the pruning of a data entry is determined based on whether it can contribute to a specific pattern. Data antimonotonicity cannot be used for pruning the data space if the Apriori algorithm is used because the data are associated with all of the currently active patterns. At any iteration, there are usually many active patterns. A data entry that cannot contribute to the formation of the superpatterns of a given pattern may still be

Return Main Page Previous Page Next Page

®Online Book Reader