Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [18]

By Root 718 0
data mining and analysis.

Figure 1.6. Effort in data-mining process.

Technical literature reports only on successful data-mining applications. To increase our understanding of data-mining techniques and their limitations, it is crucial to analyze not only successful but also unsuccessful applications. Failures or dead ends also provide valuable input for data-mining research and applications. We have to underscore the intensive conflicts that have arisen between practitioners of “digital discovery” and classical, experience-driven human analysts objecting to these intrusions into their hallowed turf. One good case study is that of U.S. economist Orley Ashenfelter, who used data-mining techniques to analyze the quality of French Bordeaux wines. Specifically, he sought to relate auction prices to certain local annual weather conditions, in particular, rainfall and summer temperatures. His finding was that hot and dry years produced the wines most valued by buyers. Ashenfelter’s work and analytical methodology resulted in a deluge of hostile invective from established wine-tasting experts and writers. There was a fear of losing a lucrative monopoly, and the reality that a better informed market is more difficult to manipulate on pricing. Another interesting study is that of U.S. baseball analyst William James, who applied analytical methods to predict which of the players would be most successful in the game, challenging the traditional approach. James’s statistically driven approach to correlating early performance to mature performance in players resulted very quickly in a barrage of criticism and rejection of the approach.

There have been numerous claims that data-mining techniques have been used successfully in counter-terrorism intelligence analyses, but little has surfaced to support these claims. The idea is that by analyzing the characteristics and profiles of known terrorists, it should be feasible to predict who in a sample population might also be a terrorist. This is actually a good example of potential pitfalls in the application of such analytical techniques to practical problems, as this type of profiling generates hypotheses, for which there may be good substantiation. The risk is that overly zealous law enforcement personnel, again highly motivated for good reasons, overreact when an individual, despite his or her profile, is not a terrorist. There is enough evidence in the media, albeit sensationalized, to suggest this is a real risk. Only careful investigation can prove whether the possibility is a probability. The degree to which a data-mining process supports business goals or scientific objectives of data explorations is much more important than the algorithms and data-mining tools it uses.

1.7 ORGANIZATION OF THIS BOOK


After introducing the basic concepts of data mining in Chapter 1, the rest of the book follows the basic phases of a data-mining process. In Chapters 2 and 3 the common characteristics of raw, large, data sets and the typical techniques of data preprocessing are explained. The text emphasizes the importance and influence of these initial phases on the final success and quality of data-mining results. Chapter 2 provides basic techniques for transforming raw data, including data sets with missing values and with time-dependent attributes. Outlier analysis is a set of important techniques for preprocessing of messy data and is also explained in this chapter. Chapter 3 deals with reduction of large data sets and introduces efficient methods for reduction of features, values, and cases. When the data set is preprocessed and prepared for mining, a wide spectrum of data-mining techniques is available, and the selection of a technique or techniques depends on the type of application and data characteristics. In Chapter 4, before introducing particular data-mining methods, we present the general theoretical background and formalizations applicable for all mining techniques. The essentials of the theory can be summarized with the question: How can one learn from data? The emphasis in

Return Main Page Previous Page Next Page

®Online Book Reader