Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [287]

By Root 860 0
of analysis is a simple rule-based approach, where the rule is simply the frequency of faults in specific components.

B.6 PITFALLS OF DATA MINING


Despite the above and many other success stories often presented by vendors and consultants to show the benefits that data mining provides, this technology has several pitfalls. When used improperly, data mining can generate lots of “garbage.” As one professor from MIT pointed out: “Given enough time, enough attempts, and enough imagination, almost any set of data can be teased out of any conclusion.” David J. Lainweber, managing director of First Quadrant Corp. in Pasadena, California, gives an example of the pitfalls of data mining. Working with a United Nations data set, he found that historically, butter production in Bangladesh is the single best predictor of the Standard & Poor’s 500-stock index. This example is similar to another absurd correlation that is heard yearly around Super Bowl time—a win by the NFC team implies a rise in stock prices. Peter Coy, Business Week’s associate economics editor, warns of four pitfalls in data mining:

1. It is tempting to develop a theory to fit an oddity found in the data.

2. One can find evidence to support any preconception if you let the computer churn long enough.

3. A finding makes more sense if there is a plausible theory for it. But a beguiling story can disguise weaknesses in the data.

4. The more factors or features in a data set the computer considers, the more likely the program will find a relationship, valid or not.

It is crucial to realize that data mining can involve a great deal of planning and preparation. Just having a large amount of data alone is no guarantee of the success of a data-mining project. In the words of one senior product manager from Oracle: “Be prepared to generate a lot of garbage until you hit something that is actionable and meaningful for your business.”

This appendix is certainly not an inclusive list of all data-mining activities, but it does provide examples of how data-mining technology is employed today. We expect that new generations of data-mining tools and methodologies will increase and extend the spectrum of application domains.

Note

1 A gigajoule (GJ) is a metric term used for measuring energy use. For example, 1 GJ is equivalent to the amount of energy available from either: 277.8 kWh of electricity, or 26.1 m3 of natural gas, or 25.8 L of heating oil.

BIBLIOGRAPHY

CHAPTER 1

Adriaans, P., D. Zantinge, Data Mining, Addison-Wesley Publ. Co., New York, 1996.

Agosta, L., The Essential Guide to Data Warehousing, Prentice Hall, Inc., Upper Saddle River, NJ, 2000.

An, A., C. Chun, N. Shan, N. Cercone, W. Ziarko, Applying Knowledge Discovery to Predict Watter-Supply Consumption, IEEE Expert, July/August 1997, pp. 72–78.

Barquin, R., H. Edelstein, Building, Using, and Managing the Data Warehouse, Prentice Hall, Inc., Upper Saddle River, NJ, 1997.

Ben, H., E. King, How to Prepare for Data Mining, http://www.b-eye-network.com/channels/1415/view/10880, July 2009.

Berson, A., S. Smith, K. Thearling, Building Data Mining Applications for CRM, McGraw-Hill, New York, 2000.

Bischoff, J., T. Alexander, Data Warehouse: Practical Advice from the Experts, Prentice Hall, Inc., Upper Saddle River, NJ, 1997.

Brachman, R. J., T. Khabaza, W. Kloesgen, G. S. Shapiro, E. Simoudis, Mining Business Databases, CACM, Vol. 39, No. 11, 1996, pp. 42–48.

De Ville, B., Managing the Data Mining Project, Microsoft Data Mining, 2001, pp. 93–116.

Djoko, S., D. J. Cook, L. B. Holder, An Empirical Study of Domain Knowledge and Its Benefits to Substructure Discovery, IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 4, 1997, pp. 575–585.

Fayyad, U., G. P. Shapiro, P. Smyth, The KDD Process for Extracting Useful Knowledge from Volumes of Data, CACM, Vol. 39, No. 11, 1966, pp. 27–34.

Fayyad, U. M., G. Piatetsky-Shapiro, P. Smith, R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, Cambridge, 1996a.

Fayyad, U., G. P. Shapiro, P. Smyth, From Data

Return Main Page Previous Page Next Page

®Online Book Reader