Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining - Mehmed Kantardzic [167]

By Root 837 0

dimensional part (a1, a2, … , an) and itemset part (items-t). It is commonsense to divide the mining process into two steps: first mine patterns about dimensional information and then find frequent itemsets from the projected sub-database, or vice versa. Without any preferences in the methodology we will illustrate the first approach using the multidimensional database DB in Table 10.4.

TABLE 10.4. Multidimensional-Transactional Database DB

One can first find the frequent multidimensional-value combinations and then find the corresponding frequent itemsets of a database. Suppose that the threshold value for our database DB in Table 10.4 is set to 2. Then, the combination of attribute values that occurs two or more than two times is frequent, and it is called a multidimensional pattern or MD pattern. For mining MD patterns, a modified Bottom Up Computation (BUC) algorithm can be used (it is an efficient “iceberg cube” computing algorithm). The basic steps of the BUC algorithm are as follows:

1. First, sort all tuples in the database in alphabetical order of values in the first dimension (A1), because the values for A1 are categorical. The only MD pattern found for this dimension is (a, *, *) because only the value a occurs two times; the other values b and c occur only once and they are not part of the MD patterns. Value * for the other two dimensions shows that they are not relevant in this first step, and they could have any combination of allowed values.

Select tuples in a database with found M pattern (or patterns). In our database, these are the samples with ID values 01 and 03. Sort the reduced database again with respect to the second dimension (A2), where the values are 1 and 2. Since no pattern occurs twice, there are no MD patterns for exact A1 and A2 values. Therefore, one can ignore the second dimension A2 (this dimension does not reduce the database further). All selected tuples are used in the next phase.

Selected tuples in the database are sorted in alphabetical order of values for the third dimension (in our example A3 with categorical values). A subgroup (a, *, m) is contained in two tuples and it is an MD pattern. Since there are no more dimensions in our example, the search continues with the second step.

2. Repeat the processes in step 1; only start not with the first but with the second dimension (first dimension is not analyzed at all in this iteration). In the following iterations, reduce the search process further for one additional dimension at the beginning. Continue with other dimensions.

In our example in the second iteration, starting with attribute A2, MD pattern (*, 2, *) will be found. Including dimension A3, there are no additional MD patterns. The third and last iteration in our example starts with the A3 dimension and the corresponding pattern is (*, *, m).

In summary, the modified BUC algorithm defines a set of MD patterns with the corresponding projections of a database. The processing tree for our example of database DB is shown in Figure 10.8. Similar trees will be generated for a larger number of dimensions.

Figure 10.8. A processing tree using the BUC algorithm for the database in Table 10.4.

When all MD patterns are found, the next step in the analysis of multidimensional-transactional database is the mining of frequent itemsets in the MD-projected database for each MD pattern. An alternative approach is based on finding frequent itemsets first and then the corresponding MD patterns.

10.8 REVIEW QUESTIONS AND PROBLEMS

1. What is the essential difference between association rules and decision rules (described in Chapter 6)?

2. What are the typical industries in which market-basket analysis plays an important role in the strategic decision-making processes?

3. What are the common values for support and confidence parameters in the Apriori algorithm? Explain using the retail industry as an example.

4. Why is the process of discovering association rules relatively simple compared with generating large itemsets in transactional databases?

5. Given a simple transactional

Online Book Reader

Data Mining - Mehmed Kantardzic [167]

®Online Book Reader