Data Mining - Mehmed Kantardzic [191]
Figure 12.10. Preparatory phase for link-betweenness computation. (a) Initial graph; (b) transformed graph with the root in node A.
The result of the completed forward phase is given in Figure 12.11a. Each node is labeled with the number of shortest paths from the root node A. For example, the node J has three shortest paths, two of them through the node H (ADHJ and AEHJ) and one through the node G (ADGJ).
Figure 12.11. Computation of the partial link-betweenness measure. (a) Forward phase; (b) backward phase.
The backward phase starts from the bottom of the layered graph structure, in our example, from node K. If there are multiple paths from the given node up, count betweenness measure of each link fractionally. The proportion is determined by the number of shortest paths to these nodes on the previous layer. What is the amount we are splitting between these links? The amount is defined as 1 + sum of all betweenness measures entering into the node from below. For example, from node K there are two paths toward nodes I and J, and because both nodes have the same number of shortest paths (3), the amount we are splitting is 1 + 0 = 1, and the partial betweenness measure for links IK and JK is 0.5. In a similar way, we may compute betweennes measures for node G. Total betweenness value for splitting is 1 + 0.5 + 0.5 = 2. There is only one node up; it is D, and the link-betweenness measure for GD is 2.
When we compute betweenness measures for all links in the graph, the procedure should be repeated for the other nodes in the graph, such as the root nodes, until each node of the network is explored. Finally, all partial link scores determined for different graphs should be added to determine the final link-betweenness score.
Graph-mining applications are far more challenging to implement because of the additional constraints that arise from the structural nature of the underlying graph. The problem of frequent pattern mining has been widely studied in the context of mining transactional data. Recently, the techniques for frequent-pattern mining have also been extended to the case of graph data. This algorithm attempts to find interesting or commonly occurring subgraphs in a set of graphs. Discovery of these patterns may be the sole purpose of the systems, or the discovered patterns may be used for graph classification or graph summarization. The main difference in the case of graphs is that the process of determining support is quite different. The problem can be defined in different ways, depending upon the application domain. In the first case, we have a group of graphs, and we wish to determine all the patterns that support a fraction of the corresponding graphs. In the second case, we have a single large graph, and we wish to determine all the patterns that are supported at least a certain number of times in this large graph. In both cases, we need to account for the isomorphism issue in determining whether one graph is supported by another or not. However, the problem of defining the support is much more challenging if overlaps are allowed between different embeddings. Frequently occurring subgraphs in a large graph or a set of graphs could represent important motifs in the real-world data.
Apriori-style algorithms can be extended to the case of discovering frequent subgraphs in graph data, by using a similar level-wise strategy of generating (k + 1) candidates from k-patterns. Various measures to mine substructure frequencies in graphs