Data Mining_ Concepts and Techniques - Jiawei Han [103]
Figure 4.15 Indexing OLAP data using bitmap indices.
Bitmap indexing is advantageous compared to hash and tree indices. It is especially useful for low-cardinality domains because comparison, join, and aggregation operations are then reduced to bit arithmetic, which substantially reduces the processing time. Bitmap indexing leads to significant reductions in space and input/output (I/O) since a string of characters can be represented by a single bit. For higher-cardinality domains, the method can be adapted using compression techniques.
The join indexing method gained popularity from its use in relational database query processing. Traditional indexing maps the value in a given column to a list of rows having that value. In contrast, join indexing registers the joinable rows of two relations from a relational database. For example, if two relations R(RID, A) and S(B, SID) join on the attributes A and B, then the join index record contains the pair (RID, SID), where RID and SID are record identifiers from the R and S relations, respectively. Hence, the join index records can identify joinable tuples without performing costly join operations. Join indexing is especially useful for maintaining the relationship between a foreign key2 and its matching primary keys, from the joinable relation.
2A set of attributes in a relation schema that forms a primary key for another relation schema is called a foreign key.
The star schema model of data warehouses makes join indexing attractive for cross-table search, because the linkage between a fact table and its corresponding dimension tables comprises the fact table's foreign key and the dimension table's primary key. Join indexing maintains relationships between attribute values of a dimension (e.g., within a dimension table) and the corresponding rows in the fact table. Join indices may span multiple dimensions to form composite join indices. We can use join indices to identify subcubes that are of interest.
Join indexing
In Example 3.4, we defined a star schema for AllElectronics of the form “sales_star [time, item, branch, location ]: dollars_sold = sum (sales_in_dollars).” An example of a join index relationship between the sales fact table and the location and item dimension tables is shown in Figure 4.16. For example, the “Main Street” value in the location dimension table joins with tuples T57, T238, and T884 of the sales fact table. Similarly, the “Sony-TV” value in the item dimension table joins with tuples T57 and T459 of the sales fact table. The corresponding join index tables are shown in Figure 4.17.
Figure 4.16 Linkages between a sales fact table and location and item dimension tables.
Figure 4.17 Join index tables based on the linkages between the sales fact table and the location and item dimension tables shown in Figure 4.16.
Suppose that there are 360 time values, 100 items, 50 branches, 30 locations, and 10 million sales tuples in the sales_star data cube. If the sales fact table has recorded sales for only 30 items, the remaining 70 items will obviously not participate in joins. If join indices are not used, additional I/Os have to be performed to bring the joining portions of the fact table and the dimension tables together.
To further speed up query processing, the join indexing and the bitmap indexing methods can be integrated to form bitmapped join indices.
4.4.3. Efficient Processing of OLAP Queries
The purpose of materializing cuboids and constructing OLAP index structures is to speed up query processing in data cubes. Given materialized views, query processing should proceed as follows:
1. Determine which operations should be performed on the available cuboids: This involves transforming any selection, projection, roll-up (group-by), and drill-down operations specified in the query into corresponding SQL and/or OLAP operations. For example, slicing