Data Mining_ Concepts and Techniques - Jiawei Han [268]
departmentstatusagesalarycount
sales senior 31…35 46K…50K 30
sales junior 26…30 26K…30K 40
sales junior 31…35 31K…35K 40
systems junior 21…25 46K…50K 20
systems senior 31…35 66K…70K 5
systems junior 26…30 46K…50K 3
systems senior 41…45 66K…70K 3
marketing senior 36…40 46K…50K 10
marketing junior 31…35 41K…45K 4
secretary senior 46…50 36K…40K 4
secretary junior 26…30 26K…30K 6
Let status be the class-label attribute.
(a) Design a multilayer feed-forward neural network for the given data. Label the nodes in the input and output layers.
(b) Using the multilayer feed-forward neural network obtained in (a), show the weight values after one iteration of the backpropagation algorithm, given the training instance “(sales, senior, 31…35, 46K…50K) ”. Indicate your initial weight values and biases and the learning rate used.
9.2 The support vector machine is a highly accurate classification method. However, SVM classifiers suffer from slow processing when training with a large set of data tuples. Discuss how to overcome this difficulty and develop a scalable SVM algorithm for efficient SVM classification in large data sets.
9.3 Compare and contrast associative classification and discriminative frequent pattern–based classification. Why is classification based on frequent patterns able to achieve higher classification accuracy in many cases than a classic decision tree method?
9.4 Compare the advantages and disadvantages of eager classification (e.g., decision tree, Bayesian, neural network) versus lazy classification (e.g., k-nearest neighbor, case-based reasoning).
9.5 Write an algorithm for k-nearest-neighbor classification given k, the nearest number of neighbors, and n, the number of attributes describing each tuple.
9.6 Briefly describe the classification processes using (a) genetic algorithms, (b) rough sets, and (c) fuzzy sets.
9.7 Example 9.3 showed a use of error-correcting codes for a multiclass classification problem having four classes.
(a) Suppose that, given an unknown tuple to label, the seven trained binary classifiers collectively output the codeword 0101110, which does not match a codeword for any of the four classes. Using error correction, what class label should be assigned to the tuple?
(b) Explain why using a 4-bit vector for the codewords is insufficient for error correction.
9.8 Semi-supervised classification, active learning, and transfer learning are useful for situations in which unlabeled data are abundant.
(a) Describe semi-supervised classification, active learning, and transfer learning. Elaborate on applications for which they are useful, as well as the challenges of these approaches to classification.
(b) Research and describe an approach to semi-supervised classification other than self-training and cotraining.
(c) Research and describe an approach to active learning other than pool-based learning.
(d) Research and describe an alternative approach to instance-based transfer learning.
9.10. Bibliographic Notes
For an introduction to Bayesian belief networks, see Darwiche [Dar10] and Heckerman [Hec96]. For a thorough presentation of probabilistic networks, see Pearl [KF09] and Koller and Friedman [KF09]. Solutions for learning the belief network structure from training data given observable variables are proposed in Cooper and Herskovits [CH92]; Buntine [Bun94]; and Heckerman, Geiger, and Chickering [HGC95]. Algorithms for inference on belief networks can be found in Russell and Norvig [RN95] and Jensen [Jen96]. The method of gradient descent, described in Section 9.1.2, for training Bayesian belief networks, is given in Russell, Binder, Koller, and Kanazawa [RBKK95]. The example given in Figure 9.1 is adapted from Russell et al. [RBKK95].
Alternative strategies for learning belief networks with hidden variables include application of Dempster, Laird, and Rubin's [DLR77] EM (Expectation Maximization) algorithm (Lauritzen [Lau95]) and methods based on the minimum description length principle (Lam [Lam98]). Cooper [Coo90] showed that