Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [117]

By Root 846 0
= Low −→ Value = High

Air Conditioner = Working and Engine = Good −→ Value = High

Air Conditioner = Working and Engine = Bad −→ Value = Low

Air Conditioner = Broken −→ Value = Low

(a) Are the rules mutually exclusive? Explain your answer.

(b) Is the rule set exhaustive (covering each possible case)? Explain your answer.

(c) Is ordering needed for this set of rules? Explain your answer.

(d) Do you need a default class for the rule set? Explain your answer.

13. Of the following algorithms:

C4.5

K-Nearest Neighbor

Naïve Bayes

Linear Regression(a) Which are fast in training but slow in classification?

(b) Which one produces classification rules?

(c) Which one requires discretization of continuous attributes before application?

(d) Which model is the most complex?

14.

(a) How much information is involved in choosing one of eight items, assuming that they have an equal frequency?

(b) One of 16 items?

15. The following data set will be used to learn a decision tree for predicting whether a mushroom is edible or not based on its shape, color, and odor.

(a) What is entropy H(Edible|Odor = 1 or Odor = 3)?

(b) Which attribute would the C4.5 algorithm choose to use for the root of the tree?

(c) Draw the full decision tree that would be learned for this data (no pruning).

(d) Suppose we have a validation set as follows. What will be the training set error and validation set error of the tree? Express your answer as the number of examples that would be misclassified.

6.9 REFERENCES FOR FURTHER STUDY


Dzeroski, S., N. Lavrac, eds., Relational Data Mining, Springer-Verlag, Berlin, Germany, 2001.

Relational data mining has its roots in inductive logic programming, an area in the intersection of machine learning and programming languages. The book provides a thorough overview of different techniques and strategies used in knowledge discovery from multi-relational data. The chapters describe a broad selection of practical, inductive-logic programming approaches to relational data mining, and give a good overview of several interesting applications.

Kralj Novak, P., N. Lavrac, G. L. Webb, Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining, Journal of Machine Learning Research, Vol. 10, 2009, p. 377–403.

This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different rule learning heuristics, and use different means for selecting subsets of induced patterns. This paper contributes a novel understanding of these subareas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule discovery task and by exploring the apparent differences between the approaches.

Mitchell, T., Machine Learning, McGraw Hill, New York, NY, 1997.

This is one of the most comprehensive books on machine learning. Systematic explanations of all methods and a large number of examples for all topics are the main strengths of the book. Inductive machine-learning techniques are only a part of the book, but for a deeper understanding of current data-mining technology and newly developed techniques, it is very useful to get a global overview of all approaches in machine learning.

Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1992.

The book outlines the C4.5 algorithm step by step, with detailed explanations and many illustrative examples. The second part of the book is taken up by the source listing of the C program that makes up the C4.5 system. The explanations in the book are intended to give a broad-brush view of C4.5 inductive learning with many small heuristics, leaving the detailed discussion to the

Return Main Page Previous Page Next Page

®Online Book Reader