Data Mining_ Concepts and Techniques - Jiawei Han [377]
■ Visual and audio data mining: Visual and audio data mining is an effective way to integrate with humans' visual and audio systems and discover knowledge from huge amounts of data. A systematic development of such techniques will facilitate the promotion of human participation for effective and efficient data analysis.
■ Distributed data mining and real-time data stream mining: Traditional data mining methods, designed to work at a centralized location, do not work well in many of the distributed computing environments present today (e.g., the Internet, intranets, local area networks, high-speed wireless networks, sensor networks, and cloud computing). Advances in distributed data mining methods are expected. Moreover, many applications involving stream data (e.g., e-commerce, Web mining, stock analysis, intrusion detection, mobile data mining, and data mining for counterterrorism) require dynamic data mining models to be built in real time. Additional research is needed in this direction.
■ Privacy protection and information security in data mining: An abundance of personal or confidential information available in electronic forms, coupled with increasingly powerful data mining tools, poses a threat to data privacy and security. Growing interest in data mining for counterterrorism also adds to the concern. Further development of privacy-preserving data mining methods is foreseen. The collaboration of technologists, social scientists, law experts, governments, and companies is needed to produce a rigorous privacy and security protection mechanism for data publishing and data mining.
With confidence, we look forward to the next generation of data mining technology and the further benefits that it will bring.
13.6. Summary
■ Mining complex data types poses challenging issues, for which there are many dedicated lines of research and development. This chapter presents a high-level overview of mining complex data types, which includes mining sequence data such as time series, symbolic sequences, and biological sequences; mining graphs and networks; and mining other kinds of data, including spatiotemporal and cyber-physical system data, multimedia, text and Web data, and data streams.
■ Several well-established statistical methods have been proposed for data analysis such as regression, generalized linear models, analysis of variance, mixed-effect models, factor analysis, discriminant analysis, survival analysis, and quality control. Full coverage of statistical data analysis methods is beyond the scope of this book. Interested readers are referred to the statistical literature cited in the bibliographic notes (Section 13.8).
■ Researchers have been striving to build theoretical foundations for data mining. Several interesting proposals have appeared, based on data reduction, data compression, probability and statistics theory, microeconomic theory, and pattern discovery–based inductive databases.
■ Visual data mining integrates data mining and data visualization to discover implicit and useful knowledge from large data sets. Visual data mining includes data visualization, data mining result visualization, data mining process visualization, and interactive visual data mining. Audio data mining uses audio signals to indicate data patterns or features of data mining results.
■ Many customized data mining tools have been developed for domain-specific applications, including finance, the retail and telecommunication industries, science and engineering, intrusion detection and prevention, and recommender systems. Such application domain-based studies integrate domain-specific knowledge with data analysis techniques and provide mission-specific data mining solutions.
■ Ubiquitous data mining is the constant presence of data mining in many aspects of our daily lives. It can influence how we shop, work, search for information, and use a computer, as