Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [89]

By Root 1333 0
model, let's start by looking at a simple 2-D data cube that is, in fact, a table or spreadsheet for sales data from AllElectronics. In particular, we will look at the AllElectronics sales data for items sold per quarter in the city of Vancouver. These data are shown in Table 4.2. In this 2-D representation, the sales for Vancouver are shown with respect to the time dimension (organized in quarters) and the item dimension (organized according to the types of items sold). The fact or measure displayed is dollars_sold (in thousands).

Table 4.2 2-D View of Sales Data for AllElectronics According to time and item

Note: The sales are from branches located in the city of Vancouver. The measure displayed is dollars_sold} (in thousands).

location = “Vancouver”

item (type)

time (quarter)home entertainmentcomputerphonesecurity

Q1 605 825 14 400

Q2 680 952 31 512

Q3 812 1023 30 501

Q4 927 1038 38 580

Now, suppose that we would like to view the sales data with a third dimension. For instance, suppose we would like to view the data according to time and item, as well as location, for the cities Chicago, New York, Toronto, and Vancouver. These 3-D data are shown in Table 4.3. The 3-D data in the table are represented as a series of 2-D tables. Conceptually, we may also represent the same data in the form of a 3-D data cube, as in Figure 4.3.

Table 4.3 3-D View of Sales Data for AllElectronics According to time, item, and location

Note: The measure displayed is dollars_sold (in thousands).

location = “Chicago”location = “New York”location = “Toronto”location = “Vancouver”

itemitemitemitem

timehome ent.comp.phonesec.home ent.comp.phonesec.home ent.comp.phonesec.home ent.comp.phonesec.

Q1 0854 882 89 623 1087 0968 38 0872 818 746 43 591 605 0825 14 400

Q2 0943 890 64 698 1130 1024 41 0925 894 769 52 682 680 0952 31 512

Q3 1032 924 59 789 1034 1048 45 1002 940 795 58 728 812 1023 30 501

Q4 1129 992 63 870 1142 1091 54 0984 978 864 59 784 927 1038 38 580

Figure 4.3 A 3-D data cube representation of the data in Table 4.3, according to time, item, and location. The measure displayed is dollars_sold (in thousands).

Suppose that we would now like to view our sales data with an additional fourth dimension such as supplier. Viewing things in 4-D becomes tricky. However, we can think of a 4-D cube as being a series of 3-D cubes, as shown in Figure 4.4. If we continue in this way, we may display any n-dimensional data as a series of (n − 1)-dimensional “cubes.” The data cube is a metaphor for multidimensional data storage. The actual physical storage of such data may differ from its logical representation. The important thing to remember is that data cubes are n-dimensional and do not confine data to 3-D.

Figure 4.4 A 4-D data cube representation of sales data, according to time, item, location, and supplier. The measure displayed is dollars_sold (in thousands). For improved readability, only some of the cube values are shown.

Table 4.2 and Table 4.3 show the data at different degrees of summarization. In the data warehousing research literature, a data cube like those shown in Figure 4.3 and Figure 4.4 is often referred to as a cuboid. Given a set of dimensions, we can generate a cuboid for each of the possible subsets of the given dimensions. The result would form a lattice of cuboids, each showing the data at a different level of summarization, or group-by. The lattice of cuboids is then referred to as a data cube. Figure 4.5 shows a lattice of cuboids forming a data cube for the dimensions time, item, location, and supplier.

Figure 4.5 Lattice of cuboids, making up a 4-D data cube for time, item, location, and supplier. Each cuboid represents a different degree of summarization.

The cuboid that holds the lowest level of summarization is called the base cuboid. For example, the 4-D cuboid in Figure 4.4 is the base cuboid for the given time, item, location, and supplier dimensions. Figure 4.3 is a 3-D (nonbase) cuboid for time, item, and location, summarized for all suppliers. The 0-D cuboid, which holds

Return Main Page Previous Page Next Page

®Online Book Reader