Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [175]

By Root 1366 0
associations between quantitative items or attributes, then it is a quantitative association rule. In these rules, quantitative values for items or attributes are partitioned into intervals. Rule (7.3) can also be considered a quantitative association rule where the quantitative attributes age and income have been discretized.

■ Based on the constraints or criteria used to mine selective patterns: The patterns or rules to be discovered can be constraint-based (i.e., satisfying a set of user-defined constraints), approximate, compressed, near-match (i.e., those that tally the support count of the near or almost matching itemsets), top-k (i.e., the k most frequent itemsets for a user-specified value, k), redundancy-aware top-k (i.e., the top-k patterns with similar or redundant patterns excluded), and so on.

Alternatively, pattern mining can be classified with respect to the kinds of data and applications involved, using the following criteria:

■ Based on kinds of data and features to be mined: Given relational and data warehouse data, most people are interested in itemsets. Thus, frequent pattern mining in this context is essentially frequent itemset mining, that is, to mine frequent sets of items. However, in many other applications, patterns may involve sequences and structures. For example, by studying the order in which items are frequently purchased, we may find that customers tend to first buy a PC, followed by a digital camera, and then a memory card. This leads to sequential patterns, that is, frequent subsequences (which are often separated by some other events) in a sequence of ordered events.

We may also mine structural patterns, that is, frequent substructures, in a structured data set. Note that structure is a general concept that covers many different kinds of structural forms such as directed graphs, undirected graphs, lattices, trees, sequences, sets, single items, or combinations of such structures. Single items are the simplest form of structure. Each element of a general pattern may contain a subsequence, a subtree, a subgraph, and so on, and such containment relationships can be defined recursively. Therefore, structural pattern mining can be considered as the most general form of frequent pattern mining.

■ Based on application domain-specific semantics: Both data and applications can be very diverse, and therefore the patterns to be mined can differ largely based on their domain-specific semantics. Various kinds of application data include spatial data, temporal data, spatiotemporal data, multimedia data (e.g., image, audio, and video data), text data, time-series data, DNA and biological sequences, software programs, chemical compound structures, web structures, sensor networks, social and information networks, biological networks, data streams, and so on. This diversity can lead to dramatically different pattern mining methodologies.

■ Based on data analysis usages: Frequent pattern mining often serves as an intermediate step for improved data understanding and more powerful data analysis. For example, it can be used as a feature extraction step for classification, which is often referred to as pattern-based classification. Similarly, pattern-based clustering has shown its strength at clustering high-dimensional data. For improved data understanding, patterns can be used for semantic annotation or contextual analysis. Pattern analysis can also be used in recommender systems, which recommend information items (e.g., books, movies, web pages) that are likely to be of interest to the user based on similar users' patterns. Different analysis tasks may require mining rather different kinds of patterns as well.

The next several sections present advanced methods and extensions of pattern mining, as well as their application. Section 7.2 discusses methods for mining multilevel patterns, multidimensional patterns, patterns and rules with continuous attributes, rare patterns, and negative patterns. Constraint-based pattern mining is studied in Section 7.3. Section 7.4 explains how to mine high-dimensional

Return Main Page Previous Page Next Page

®Online Book Reader