Data Mining - Mehmed Kantardzic [219]
3. For the graph given in Problem number 2, find partial betweeness centrality using modified graph starting with node number 5.
4. Give real-world examples for traditional analyses of temporal data (i.e., trends, cycles, seasonal patterns, outliers).
5. Given the temporal sequence S = {1 2 3 2 4 6 7 5 3 1 0 2}:
(a) find PAA for four sections of the sequence;
(b) determine SAX values for solution in (a) if (1) α = 3, (2) α = 4;
(c) find PAA for three sections of the sequence; and
(d) determine SAX values for solution in (c) if (1) α = 3, (2) α = 4.
6. Given the sequence S = {A B C B A A B A B C B A B A B B C B A C C}:
(a) Find the longest subsequence with frequency ≥ 3.
(b) Construct finite-state automaton (FSA) for the subsequence found in (a).
7. Find normalized contiguity matrix for the table of U.S. cities:
Minneapolis Chicago New York
Nashville Louisville Charlotte
Make assumption that only neighboring cities (vertical and horizontal) in the table are close.
8. For the BN in Figure 12.38 determine:
(a) P(C, R, W)
(b) P(C, S, W)
9. Review the latest articles on privacy-preserving data mining that are available on the Internet. Discuss the trends in the field.
10. What are the largest sources of unintended personal data on the Internet? How do we increase awareness of Web users of their personal data that are available on the Web for a variety of data-mining activities?
11. Discuss an implementation of transparency and accountability mechanisms in a data-mining process. Illustrate your ideas with examples of real-world data-mining applications.
12. Give examples of data-mining applications where you would use DDM approach. Explain the reasons.
12.8 REFERENCES FOR FURTHER STUDY
Aggarwal C. C., P. S. Yu, Privacy-Preserving Data Mining: Models and Algorithms, Springer, Boston, 2008.
The book proposes a number of techniques to perform the data-mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, and randomization and perturbation-based techniques. This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions. Privacy-Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science, and is also suitable for industry practitioners.
Chakrabarti D., C. Faloutsos, Graph Mining: Laws, Generators, and Algorithms, ACM Computing Surveys, Vol. 38, March 2006, pp. 1–69.
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey gives an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
Da Silva J. C., et al., Distributed Data Mining and Agents, Engineering Applications of Artificial Intelligence, Vol. 18, No. 7, October 2005, pp. 791–807.
Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. DDM algorithms focus on one class of such distributed problem solving tasks—analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms