Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [190]

By Root 842 0
centrality measure is closeness, which is the mean geodesic (i.e., shortest path) distance between a vertex and all other vertices reachable from it. Examples of computations for both measures are given in Figure 12.8. Closeness can be regarded as a measure of how long it will take for information to spread from a given node to others in the graph. Nodes that have a low distance to all other nodes in the graph have high closeness centrality.

Figure 12.8. Degree and closeness parameters of the graph.

Another important class of centrality measures is the class of betweenness measures. Betweenness is a measure of the extent to which a vertex lies on the paths between others. The simplest and most widely used betweenness measure is shortest path betweenness, or simply betweenness. The betweenness of a node i is defined to be the fraction of shortest paths between any pair of vertices in a graph that passes through node i. This is, in some sense, a measure of the influence a node has over the spread of connections through the network. Nodes with high betweenness values occur on a larger number of shortest paths and are presumably more important than nodes with low betweenness. The parameter is costly to compute especially when the graphs are complex with a large number of nodes and links. Currently, the fastest known algorithms require O(n3) time complexity and O(n2) space complexity, where n is the number of nodes in the graph.

Illustrative examples of centrality measures and their interpretations are given in Figure 12.9. Node X has importance because it bridges the structural hole between the two clusters of interconnected nodes. It has the highest betweenness measure compared with all the other nodes in the graph. Such nodes get lots of brokerage opportunities and can control the flow in the paths between subgraphs. On the other hand, node Y is in the middle of a dense web of nodes that provides easy, short path access to neighboring nodes; thus, Y also has a good central position in the subgraph. This characteristic of the node Y is described with the highest degree measure.

Figure 12.9. Different types of a node’s importance in a graph.

The vertex-betweenness index reflects the amount of control exerted by a given vertex over the interactions between the other vertices in the graph. The other approach to measure betweenness is to concentrate on links in the graph instead of nodes. Edge-betweenness centrality is related to the frequency of an edge placed on the shortest paths between all pairs of vertices. The betweenness centrality of an edge in a network is given by the sum of the edge-betweenness values for all pairs of nodes in the graph going through the given edge. The edges with highest betweenness values are most likely to lie between subgraphs, rather than inside a subgraph. Consequently, successively removing edges with the highest edge-betweenness will eventually isolate subgraphs consisting of nodes that share connections only with other nodes in the same subgraph. This gives the edge-betweenness index a central role in graph-clustering algorithms where it is necessary to separate a large graph into smaller highly connected subgraphs. Edge- (also called link-) betweenness centrality is traditionally determined in two steps:

1. Compute the length and number of shortest paths between all pairs of nodes through the link.

2. Sum all link dependencies.

The overall betweenness centrality of a link v is obtained by summing up its partial betweenness values for this link, calculated using the graph transformation on breadth first strategy from each node. For example, at the beginning it is given a graph in Figure 12.10a and it is necessary to find link betweenness measures for all links in the graph. In the first step, we build a “modified” graph that starts with node A, and specifies all links in the graph, layer by layer: first neighbors, second neighbors, and so on. The resulting graph is given in Figure 12.10b. This graph is the starting point for partial betweenness computation. The total betweeness

Return Main Page Previous Page Next Page

®Online Book Reader