Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [182]

By Root 1465 0
a strongly negatively correlated pattern.

This definition can easily be extended for patterns containing k-itemsets for k > 2.

A problem with the definition, however, is that it is not null-invariant. That is, its value can be misleadingly influenced by null transactions, where a null-transaction is a transaction that does not contain any of the itemsets being examined (Section 6.3.3). This is illustrated in Example 7.4.

Null-transaction problem with Definition 7.1

If there are a lot of null-transactions in the data set, then the number of null-transactions rather than the patterns observed may strongly influence a measure's assessment as to whether a pattern is negatively correlated. For example, suppose a sewing store sells needle packages A and B. The store sold 100 packages each of A and B, but only one transaction contains both A and B. Intuitively, A is negatively correlated with B since the purchase of one does not seem to encourage the purchase of the other.

Let's see how the above Definition 7.1 handles this scenario. If there are 200 transactions, we have and . Thus, , and so Definition 7.1 indicates that A and B are strongly negatively correlated. What if, instead of only 200 transactions in the database, there are 106? In this case, there are many null-transactions, that is, many contain neither A nor B. How does the definition hold up? It computes and . Thus, , which contradicts the earlier finding even though the number of occurrences of A and B has not changed. The measure in Definition 7.1 is not null-invariant, where null-invariance is essential for quality interestingness measures as discussed in Section 6.3.3.


Definition 7.2

If X and Y are strongly negatively correlated, then

Is this measure null-invariant?

Null-transaction problem with Definition 7.2

Given our needle package example, when there are in total 200 transactions in the database, we have

which, according to Definition 7.2, indicates that A and B are strongly negatively correlated. What if there are 106 transactions in the database? The measure would compute

This time, the measure indicates that A and B are positively correlated, hence, a contradiction. The measure is not null-invariant.


As a third alternative, consider Definition 7.3, which is based on the Kulczynski measure (i.e., the average of conditional probabilities). It follows the spirit of interestingness measures introduced in Section 6.3.3.

Definition 7.3

Suppose that itemsets X and Y are both frequent, that is, and , where min_sup is the minimum support threshold. If , where ϵ is a negative pattern threshold, then pattern is a negatively correlated pattern.

Negatively correlated patterns using Definition 7.3, based on the Kulczynski measure

Let's reexamine our needle package example. Let min_sup be 0.01% and ϵ = 0.02. When there are 200 transactions in the database, we have and ; thus A and B are negatively correlated. Does this still hold true if we have many more transactions? When there are 106 transactions in the database, the measure computes and , again indicating that A and B are negatively correlated. This matches our intuition. The measure does not have the null-invariance problem of the first two definitions considered.

Let's examine another case: Suppose that among 100,000 transactions, the store sold 1000 needle packages of A but only 10 packages of B; however, every time package B is sold, package A is also sold (i.e., they appear in the same transaction). In this case, the measure computes , which indicates that A and B are positively correlated instead of negatively correlated. This also matches our intuition.


With this new definition of negative correlation, efficient methods can easily be derived for mining negative patterns in large databases. This is left as an exercise for interested readers.

7.3. Constraint-Based Frequent Pattern Mining


A data mining process may uncover thousands of rules from a given data set, most of which end up being unrelated or uninteresting to users. Often, users have a good sense of which “direction

Return Main Page Previous Page Next Page

®Online Book Reader