Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [87]

By Root 797 0
phase, in real-world applications. The system developed for Banmedica was measured after analysis in terms of fraudulent cases found and the amount of money saved. If these numbers were not in favor of the system, then it would have been rolled back. In the case of the REMIND system, the results of the system wide search had to be manually analyzed for accuracy. It was not enough that the rules were good, but the actual patients found needed to be reviewed.

4.10 REVIEW QUESTIONS AND PROBLEMS

1. Explain the differences between the basic types of inferences: induction, deduction, and transduction.

2. Why do we use the observational approach in most data-mining tasks?

3. Discuss situations in which we would use the interpolated functions given in Figure 4.3b,c,d as “the best” data-mining model.

4. Which of the functions have linear parameters and which have nonlinear? Explain why.

(a) y = a x5 + b

(b) y = a/x

(c) y = a ex

(d) y = ea x

5. Explain the difference between interpolation of loss function for classification problems and for regression problems.

6. Is it possible that empirical risk becomes higher than expected risk? Explain.

7. Why is it so difficult to estimate the VC dimension for real-world data-mining applications?

8. What will be the practical benefit of determining the VC dimension in real-world data-mining applications?

9. Classify the common learning tasks explained in Section 4.4 as supervised or unsupervised learning tasks. Explain your classification.

10. Analyze the differences between validation and verification of inductive-based models.

11. In which situations would you recommend the leave-one-out method for validation of data-mining results?

12. Develop a program for generating “fake” data sets using the bootstrap method.

13. Develop a program for plotting an ROC curve based on a table of FAR–FRR results.

14. Develop an algorithm for computing the area below the ROC curve (which is a very important parameter in the evaluation of inductive-learning results for classification problems).

15. The testing data set (inputs: A, B, and C, output: Class) is given together with testing results of the classification (predicted output). Find and plot two points on the ROC curve for the threshold values of 0.5 and 0.8.

16. Machine-learning techniques differ from statistical techniques in that machine learning methods

(a) typically assume an underlying distribution for the data,

(b) are better able to deal with missing and noisy data,

(c) are not able to explain their behavior, and

(d) have trouble with large-sized data sets.

17. Explain the difference between sensitivity and specificity.

18. When do you need to use a separate validation set, in addition to train and test sets?

19. In this question we will consider learning problems where each instance x is some integer in the set X = {1, 2, … , 127}, and where each hypothesis h ∈ H is an interval of the form a ≤ x ≤ b, where a and b can be any integers between 1 and 127 (inclusive), so long as a ≤ b. A hypothesis a ≤ x ≤ b labels instance x positive if x falls into the interval defined by a and b, and labels the instance negative otherwise. Assume throughout this question that the teacher is only interested in teaching concepts that can be represented by some hypothesis in H.

(a) How many distinct hypotheses are there in H?

(b) Suppose the teacher is trying to teach the specific target concept 32 ≤ x ≤ 84. What is the minimum number of training examples the teacher must present to guarantee that any consistent learner will learn this concept exactly?

20. Is it true that the SVM learning algorithm is guaranteed to find the globally optimal hypothesis with respect to its object function? Discuss your answer.

4.11 REFERENCES FOR FURTHER STUDY


Alpaydin, A, Introduction to Machine Learning, 2nd edition, The MIT Press, Boston, 2010.

The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that

Return Main Page Previous Page Next Page

®Online Book Reader