Data Mining - Mehmed Kantardzic [193]
Figure 12.14. Graph clustering using betweenness measure. (a) Initial graph; (b) subgraphs after elimination of links with maximum betweenness.
The second case of cluster analysis assumes multiple graphs, each of which may possibly be of modest size. These large number of graphs need to be clustered based on their underlying structural behavior. The problem is challenging because of the need to match the structures of the underlying graphs, and these structures are used for clustering purposes. The main idea is that we wish to cluster graphs as objects, and the distance between graphs is defined based on a structural similarity function such as the edit distance. This clustering approach makes it an ideal technique for applications in areas such as scientific-data exploration, information retrieval, computational biology, Web-log analysis, forensics analysis, and blog analysis.
Link analysis is an important field that has received a lot of attention recently when advances in information technology enabled mining of extremely large networks. The basic data structure is still a graph, only the emphasis in analysis is on links and their characteristics: labeled or unlabeled, directed or undirected. There is an inherent ambiguity with respect to the term “link” that occurs in many circumstances, but especially in discussions with people whose background and research interests are in the database community. In the database community, especially the subcommunity that uses the well-known entity-relationship (ER) model, a “link” is a connection between two records in two different tables. This usage of the term “link” in the database community differs from that in the intelligence community and in the artificial intelligence (AI) research community. Their interpretation of a “link” typically refers to some real world connection between two entities. Probably the most famous example of exploiting link structure in the graph is the use of links to improve information retrieval results. Both, the well-known PageRank measure and hubs, and authority scores are based on the link structure of the Web. Link analysis techniques are used in law enforcement, intelligence analysis, fraud detection, and related domains. It is sometimes described using the metaphor of “connecting the dots” because link diagrams show the connections between people, places, events, and things, and represent invaluable tools in these domains.
12.2 TEMPORAL DATA MINING
Time is one of the essential natures of data. Many real-life data describe the property or status of some object at a particular time instance. Today time-series data are being generated at an unprecedented speed from almost every application domain, for example, daily fluctuations of stock market, traces of dynamic processes and scientific experiments, medical and biological experimental observations,