Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [176]

By Root 1499 0
and colossal patterns. The mining of compressed and approximate patterns is detailed in Section 7.5. Section 7.6 discusses the exploration and applications of pattern mining. More advanced topics regarding mining sequential and structural patterns, and pattern mining in complex and diverse kinds of data are briefly introduced in Chapter 13.

7.2. Pattern Mining in Multilevel, Multidimensional Space


This section focuses on methods for mining in multilevel, multidimensional space. In particular, you will learn about mining multilevel associations (Section 7.2.1), multidimensional associations (Section 7.2.2), quantitative association rules (Section 7.2.3), and rare patterns and negative patterns (Section 7.2.4). Multilevel associations involve concepts at different abstraction levels. Multidimensional associations involve more than one dimension or predicate (e.g., rules that relate what a customer buys to his or her age). Quantitative association rules involve numeric attributes that have an implicit ordering among values (e.g., age). Rare patterns are patterns that suggest interesting although rare item combinations. Negative patterns show negative correlations between items.

7.2.1. Mining Multilevel Associations

For many applications, strong associations discovered at high abstraction levels, though with high support, could be commonsense knowledge. We may want to drill down to find novel patterns at more detailed levels. On the other hand, there could be too many scattered patterns at low or primitive abstraction levels, some of which are just trivial specializations of patterns at higher levels. Therefore, it is interesting to examine how to develop effective methods for mining patterns at multiple abstraction levels, with sufficient flexibility for easy traversal among different abstraction spaces.

Mining multilevel association rules

Suppose we are given the task-relevant set of transactional data in Table 7.1 for sales in an AllElectronics store, showing the items purchased for each transaction. The concept hierarchy for the items is shown in Figure 7.2. A concept hierarchy defines a sequence of mappings from a set of low-level concepts to a higher-level, more general concept set. Data can be generalized by replacing low-level concepts within the data by their corresponding higher-level concepts, or ancestors, from a concept hierarchy.

Table 7.1 Task-Relevant Data, D

TIDItems Purchased

T100 Apple 17″ MacBook Pro Notebook, HP Photosmart Pro b9180

T200 Microsoft Office Professional 2010, MicrosoftWireless Optical Mouse 5000

T300 Logitech VX Nano Cordless Laser Mouse, Fellowes GEL Wrist Rest

T400 Dell Studio XPS 16 Notebook, Canon PowerShot SD1400

T500 Lenovo ThinkPad X200 Tablet PC, Symantec Norton Antivirus 2010

… …

Figure 7.2 Concept hierarchy for AllElectronics computer items.

Figure 7.2's concept hierarchy has five levels, respectively referred to as levels 0 through 4, starting with level 0 at the root node for all (the most general abstraction level). Here, level 1 includes computer, software, printer and camera, and computer accessory; level 2 includes laptop computer, desktop computer, office software, antivirus software, etc.; and level 3 includes Dell desktop computer, …, Microsoft office software, etc. Level 4 is the most specific abstraction level of this hierarchy. It consists of the raw data values.

Concept hierarchies for nominal attributes are often implicit within the database schema, in which case they may be automatically generated using methods such as those described in Chapter 3. For our example, the concept hierarchy of Figure 7.2 was generated from data on product specifications. Concept hierarchies for numeric attributes can be generated using discretization techniques, many of which were introduced in Chapter 3. Alternatively, concept hierarchies may be specified by users familiar with the data such as store managers in the case of our example.

The items in Table 7.1 are at the lowest level of Figure 7.2's concept hierarchy. It is difficult to find interesting

Return Main Page Previous Page Next Page

®Online Book Reader