Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining_ Concepts and Techniques - Jiawei Han [148]

By Root 1504 0

data cubes. It facilitates systematic and focused knowledge discovery in large structured and semi-structured data sets. It will continue to endow analysts with tremendous flexibility and power at multidimensional and multigranularity exploratory analysis. This is a vast open area for researchers to build powerful and sophisticated data mining mechanisms.

■ Techniques for processing advanced queries have been proposed that take advantage of cube technology. These include sampling cubes for multidimensional analysis on sampling data, and ranking cubes for efficient top-k (ranking) query processing in large relational data sets.

■ This chapter highlighted three approaches to multidimensional data analysis with data cubes. Prediction cubes compute prediction models in multidimensional cube space. They help users identify interesting data subsets at varying degrees of granularity for effective prediction. Multifeature cubes compute complex queries involving multiple dependent aggregates at multiple granularities. Exception-based, discovery-driven exploration of cube space displays visual cues to indicate discovered data exceptions at all aggregation levels, thereby guiding the user in the data analysis process.

5.6. Exercises

5.1 Assume that a 10-D base cuboid contains only three base cells: (1) (a1, d2, d3, d4, …, d9, d10), (2) (d1, b2, d3, d4, …, d9, d10), and (3) (d1, d2, c3, d4, …, d9, d10), where a1 ≠ d1, b2 ≠ d2, and c3 ≠ d3. The measure of the cube is count().

(a) How many nonempty cuboids will a full data cube contain?

(b) How many nonempty aggregate (i.e., nonbase) cells will a full cube contain?

(d) A cell, c, is a closed cell if there exists no cell, d, such that d is a specialization of cell c (i.e., d is obtained by replacing a ∗ in c by a non-∗ value) and d has the same measure value as c. A closed cube is a data cube consisting of only closed cells. How many closed cells are in the full cube?

5.2 There are several typical cube computation methods, such as MultiWay[ZDN97], BUC[BR99] and Star-Cubing[XHLW03]. Briefly describe these three methods (i.e., use one or two lines to outline the key points), and compare their feasibility and performance under the following conditions:

(a) Computing a dense full cube of low dimensionality (e.g., less than eight dimensions).

(b) Computing an iceberg cube of around 10 dimensions with a highly skewed data distribution.

5.3 Suppose a data cube, C, has D dimensions, and the base cuboid contains k distinct tuples.

(a) Present a formula to calculate the minimum number of cells that the cube, C, may contain.

(b) Present a formula to calculate the maximum number of cells that C may contain.

(d) Answer parts (a) and (b) as if only closed cells are considered (with the minimum count threshold, v).

5.4 Suppose that a base cuboid has three dimensions, A, B, C, with the following number of cells: |A| = 1,000,000, |B| = 100, and |C| = 1000. Suppose that each dimension is evenly partitioned into 10 portions for chunking.

(a) Assuming each dimension has only one level, draw the complete lattice of the cube.

(b) If each cube cell stores one measure with four bytes, what is the total size of the computed cube if the cube is dense?

(c) State the order for computing the chunks in the cube that requires the least amount of space, and compute the total amount of main memory space required for computing the 2-D planes.

5.5 Often, the aggregate count value of many cells in a large data cuboid is zero, resulting in a huge, yet sparse, multidimensional matrix.

(a) Design an implementation method that can elegantly overcome this sparse matrix problem. Note that you need to explain your data structures in detail and discuss the space needed, as well as how to retrieve data

Online Book Reader

Data Mining_ Concepts and Techniques - Jiawei Han [148]

®Online Book Reader