Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [146]

By Root 1602 0
are computed and associated with every cell, for all aggregation levels. They are as follows:

■ SelfExp: This indicates the degree of surprise of the cell value, relative to other cells at the same aggregation level.

■ InExp: This indicates the degree of surprise somewhere beneath the cell, if we were to drill down from it.

■ PathExp: This indicates the degree of surprise for each drill-down path from the cell.

The use of these measures for discovery-driven exploration of data cubes is illustrated in Example 5.21.

Discovery-driven exploration of a data cube

Suppose that you want to analyze the monthly sales at AllElectronics as a percentage difference from the previous month. The dimensions involved are item, time, and region. You begin by studying the data aggregated over all items and sales regions for each month, as shown in Figure 5.16.

Figure 5.16 Change in sales over time.

To view the exception indicators, you click on a button marked highlight exceptions on the screen. This translates the SelfExp and InExp values into visual cues, displayed with each cell. Each cell's background color is based on its SelfExp value. In addition, a box is drawn around each cell, where the thickness and color of the box are functions of its InExp value. Thick boxes indicate high InExp values. In both cases, the darker the color, the greater the degree of exception. For example, the dark, thick boxes for sales during July, August, and September signal the user to explore the lower-level aggregations of these cells by drilling down.

Drill-downs can be executed along the aggregated item or region dimensions. “Which path has more exceptions?” you wonder. To find this out, you select a cell of interest and trigger a path exception module that colors each dimension based on the PathExp value of the cell. This value reflects that path's degree of surprise. Suppose that the path along item contains more exceptions.

A drill-down along item results in the cube slice of Figure 5.17, showing the sales over time for each item. At this point, you are presented with many different sales values to analyze. By clicking on the highlight exceptions button, the visual cues are displayed, bringing focus to the exceptions. Consider the sales difference of 41% for “Sony b/w printers” in September. This cell has a dark background, indicating a high SelfExp value, meaning that the cell is an exception. Consider now the sales difference of −15% for “Sony b/w printers” in November and of −11% in December. The −11% value for December is marked as an exception, while the −15% value is not, even though −15% is a bigger deviation than −11%. This is because the exception indicators consider all the dimensions that a cell is in. Notice that the December sales of most of the other items have a large positive value, while the November sales do not. Therefore, by considering the cell's position in the cube, the sales difference for “Sony b/w printers” in December is exceptional, while the November sales difference of this item is not.

Figure 5.17 Change in sales for each item-time combination.

The InExp values can be used to indicate exceptions at lower levels that are not visible at the current level. Consider the cells for “IBM desktop computers” in July and September. These both have a dark, thick box around them, indicating high InExp values. You may decide to further explore the sales of “IBM desktop computers” by drilling down along region. The resulting sales difference by region is shown in Figure 5.18, where the highlight exceptions option has been invoked. The visual cues displayed make it easy to instantly notice an exception for the sales of “IBM desktop computers” in the southern region, where such sales have decreased by −39% and −34% in July and September, respectively. These detailed exceptions were far from obvious when we were viewing the data as an item-time group-by, aggregated over region in Figure 5.17. Thus, the InExp value is useful for searching for exceptions at lower-level cells of the cube.

Figure 5.18 Change

Return Main Page Previous Page Next Page

®Online Book Reader