Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [161]

By Root 1449 0
support count), and a branch is created for each transaction. For example, the scan of the first transaction, “T100: I1, I2, I5,” which contains three items (I2, I1, I5 in L order), leads to the construction of the first branch of the tree with three nodes, 〈I2: 1〉, 〈I1: 1〉, and 〈I5: 1〉, where I2 is linked as a child to the root, I1 is linked to I2, and I5 is linked to I1. The second transaction, T200, contains the items I2 and I4 in L order, which would result in a branch where I2 is linked to the root and I4 is linked to I2. However, this branch would share a common prefix, I2, with the existing path for T100. Therefore, we instead increment the count of the I2 node by 1, and create a new node, 〈I4: 1〉, which is linked as a child to 〈I2: 2〉. In general, when considering the branch to be added for a transaction, the count of each node along a common prefix is incremented by 1, and nodes for the items following the prefix are created and linked accordingly.

To facilitate tree traversal, an item header table is built so that each item points to its occurrences in the tree via a chain of node-links. The tree obtained after scanning all the transactions is shown in Figure 6.7 with the associated node-links. In this way, the problem of mining frequent patterns in databases is transformed into that of mining the FP-tree.

Figure 6.7 An FP-tree registers compressed, frequent pattern information.

The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an initial suffix pattern), construct its conditional pattern base (a “sub-database,” which consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern), then construct its (conditional) FP-tree, and perform mining recursively on the tree. The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree.

Mining of the FP-tree is summarized in Table 6.2 and detailed as follows. We first consider I5, which is the last item in L, rather than the first. The reason for starting at the end of the list will become apparent as we explain the FP-tree mining process. I5 occurs in two FP-tree branches of Figure 6.7. (The occurrences of I5 can easily be found by following its chain of node-links.) The paths formed by these branches are 〈I2, I1, I5: 1〉 and 〈I2, I1, I3, I5: 1〉. Therefore, considering I5 as a suffix, its corresponding two prefix paths are 〈I2, I1: 1〉 and 〈I2, I1, I3: 1〉, which form its conditional pattern base. Using this conditional pattern base as a transaction database, we build an I5-conditional FP-tree, which contains only a single path, 〈I2: 2, I1: 2〉; I3 is not included because its support count of 1 is less than the minimum support count. The single path generates all the combinations of frequent patterns: {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}.

Table 6.2 Mining the FP-Tree by Creating Conditional (Sub-)Pattern Bases

ItemConditional Pattern BaseConditional FP-treeFrequent Patterns Generated

I5 {{I2, I1: 1}, {I2, I1, I3: 1}} 〈I2: 2, I1: 2〉 {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}

I4 {{I2, I1: 1}, {I2: 1}} 〈I2: 2〉 {I2, I4: 2}

I3 {{I2, I1: 2}, {I2: 2}, {I1: 2}} 〈I2: 4, I1: 2〉, 〈I1: 2〉 {I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}

I1 {{I2: 4}} 〈I2: 4〉 {I2, I1: 4}

For I4, its two prefix paths form the conditional pattern base, {{I2 I1: 1}, {I2: 1}}, which generates a single-node conditional FP-tree, 〈I2: 2〉, and derives one frequent pattern, {I2, I4: 2}.

Similar to the preceding analysis, I3's conditional pattern base is {{I2, I1: 2}, {I2: 2}, {I1: 2}}. Its conditional FP-tree has two branches, 〈I2: 4, I1: 2〉 and 〈I1: 2〉, as shown in Figure 6.8, which generates the set of patterns {{I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}}. Finally, I1's conditional pattern base is {{I2: 4}}, with an FP-tree that contains only one node, 〈I2: 4〉, which generates one frequent pattern, {I2, I1: 4}. This mining process is summarized in Figure 6.9.

Figure 6.8 The conditional FP-tree associated with the conditional node I3.

Figure 6.9 FP-growth algorithm

Return Main Page Previous Page Next Page

®Online Book Reader