Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [174]

By Root 1479 0
patterns. Other pattern mining themes, including mining sequential and structured patterns and mining patterns from spatiotemporal, multimedia, and stream data, are considered more advanced topics and are not covered in this book. Notice that pattern mining is a more general term than frequent pattern mining since the former covers rare and negative patterns as well. However, when there is no ambiguity, the two terms are used interchangeably.

7.1. Pattern Mining: A Road Map


Chapter 6 introduced the basic concepts, techniques, and applications of frequent pattern mining using market basket analysis as an example. Many other kinds of data, user requests, and applications have led to the development of numerous, diverse methods for mining patterns, associations, and correlation relationships. Given the rich literature in this area, it is important to lay out a clear road map to help us get an organized picture of the field and to select the best methods for pattern mining applications.

Figure 7.1 outlines a general road map on pattern mining research. Most studies mainly address three pattern mining aspects: the kinds of patterns mined, mining methodologies, and applications. Some studies, however, integrate multiple aspects; for example, different applications may need to mine different patterns, which naturally leads to the development of new mining methodologies.

Figure 7.1 A general road map on pattern mining research.

Based on pattern diversity, pattern mining can be classified using the following criteria:

■ Basic patterns: As discussed in Chapter 6, a frequent pattern may have several alternative forms, including a simple frequent pattern, a closed pattern, or a max-pattern. To review, a frequent pattern is a pattern (or itemset) that satisfies a minimum support threshold. A pattern p is a closed pattern if there is no superpattern p′ with the same support as p. Pattern p is a max-pattern if there exists no frequent superpattern of p. Frequent patterns can also be mapped into association rules, or other kinds of rules based on interestingness measures. Sometimes we may also be interested in infrequent or rare patterns (i.e., patterns that occur rarely but are of critical importance, or negative patterns (i.e., patterns that reveal a negative correlation between items).

■ Based on the abstraction levels involved in a pattern: Patterns or association rules may have items or concepts residing at high, low, or multiple abstraction levels. For example, suppose that a set of association rules mined includes the following rules where X is a variable representing a customer:

(7.1)

(7.2)

In Rules (7.1) and (7.2), the items bought are referenced at different abstraction levels (e.g., “computer” is a higher-level abstraction of “laptop computer,” and “color laser printer” is a lower-level abstraction of “printer”). We refer to the rule set mined as consisting of multilevel association rules. If, instead, the rules within a given set do not reference items or attributes at different abstraction levels, then the set contains single-level association rules.

■ Based on the number of dimensions involved in the rule or pattern: If the items or attributes in an association rule or pattern reference only one dimension, it is a single-dimensional association rule/pattern. For example, Rules (7.1) and (7.2) are single-dimensional association rules because they each refer to only one dimension, buys. 1

1Following the terminology used in multidimensional databases, we refer to each distinct predicate in a rule as a dimension.

If a rule/pattern references two or more dimensions, such as age, income, and buys, then it is a multidimensional association rule/pattern. The following is an example of a multidimensional rule:

(7.3)

■ Based on the types of values handled in the rule or pattern: If a rule involves associations between the presence or absence of items, it is a Boolean association rule. For example, Rules (7.1) and (7.2) are Boolean association rules obtained from market basket analysis.

If a rule describes

Return Main Page Previous Page Next Page

®Online Book Reader