Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining_ Concepts and Techniques - Jiawei Han [318]

By Root 1685 0

a message can be passed from Ada to Bob and vice versa. However, this information is not useful for AllElectronics' customer relationship management because the company typically does not want to send a specific message from one customer to another. Therefore, geodesic distance does not suit the application.

“What does similarity mean in a social network?” We consider two ways to define similarity:

■ Two customers are considered similar to one another if they have similar neighbors in the social network. This heuristic is intuitive because, in practice, two people receiving recommendations from a good number of common friends often make similar decisions. This kind of similarity is based on the local structure (i.e., the neighborhoods ) of the vertices, and thus is called structural context–based similarity.

■ Suppose AllElectronics sends promotional information to both Ada and Bob in the social network. Ada and Bob may randomly forward such information to their friends (or neighbors ) in the network. The closeness between Ada and Bob can then be measured by the likelihood that other customers simultaneously receive the promotional information that was originally sent to Ada and Bob. This kind of similarity is based on the random walk reachability over the network, and thus is referred to as similarity based on random walk.

Let's have a closer look at what is meant by similarity based on structural context, and similarity based on random walk.

The intuition behind similarity based on structural context is that two vertices in a graph are similar if they are connected to similar vertices. To measure such similarity, we need to define the notion of individual neighborhood. In a directed graph , where V is the set of vertices and is the set of edges, for a vertex , the individual in-neighborhood of v is defined as

(11.29)

Symmetrically, we define the individual out-neighborhood of v as

(11.30)

Following the intuition illustrated in Example 11.20, we define SimRank, a structural-context similarity, with a value that is between 0 and 1 for any pair of vertices. For any vertex, , the similarity between the vertex and itself is because the neighborhoods are identical. For vertices such that , we can define

(11.31)

where C is a constant between 0 and 1. A vertex may not have any in-neighbors. Thus, we define Eq. (11.31) to be 0 when either I(u) or I(v) is ∅. Parameter C specifies the rate of decay as similarity is propagated across edges.

“How can we compute SimRank?” A straightforward method iteratively evaluates Eq. (11.31) until a fixed point is reached. Let be the SimRank score calculated at the i th round. To begin, we set

(11.32)

We use Eq. (11.31) to compute from si as

(11.33)

It can be shown that . Additional methods for approximating SimRank are given in the bibliographic notes (Section 11.7).

Now, let's consider similarity based on random walk. A directed graph is strongly connected if, for any two nodes u and v, there is a path from u to v and another path from v to u. In a strongly connected graph, , for any two vertices, , we can define the expected distance from u to v as

(11.34)

where is a path starting from u and ending at v that may contain cycles but does not reach v until the end. For a traveling tour, , its length is . The probability of the tour is defined as

(11.35)

To measure the probability that a vertex w receives a message that originated simultaneously from u and v, we extend the expected distance to the notion of expected meeting distance, that is,

(11.36)

where is a pair of tours and of the same length. Using a constant C between 0 and 1, we define the expected meeting probability as

(11.37)

which is a similarity measure based on random walk. Here, the parameter C specifies the probability of continuing the walk at each step of the trajectory.

It has been shown that for any two vertices, u and v. That is, SimRank is based on both structural context and random walk.

11.3.3. Graph Clustering Methods

Let's consider how to conduct clustering on a graph.

Online Book Reader

Data Mining_ Concepts and Techniques - Jiawei Han [318]

®Online Book Reader