Data Mining_ Concepts and Techniques - Jiawei Han [253]
So far, we have described linear and nonlinear SVMs for binary (i.e., two-class) classification. SVM classifiers can be combined for the multiclass case. See Section 9.7.1 for some strategies, such as training one classifier per class and the use of error-correcting codes.
A major research goal regarding SVMs is to improve the speed in training and testing so that SVMs may become a more feasible option for very large data sets (e.g., millions of support vectors). Other issues include determining the best kernel for a given data set and finding more efficient methods for the multiclass case.
9.4. Classification Using Frequent Patterns
Frequent patterns show interesting relationships between attribute–value pairs that occur frequently in a given data set. For example, we may find that the attribute–value pairs and occur in 20% of data tuples describing AllElectronics customers who buy a computer. We can think of each attribute–value pair as an item, so the search for these frequent patterns is known as frequent pattern mining or frequent itemset mining. In Chapter 6 and Chapter 7, we saw how association rules are derived from frequent patterns, where the associations are commonly used to analyze the purchasing patterns of customers in a store. Such analysis is useful in many decision-making processes such as product placement, catalog design, and cross-marketing.
In this section, we examine how frequent patterns can be used for classification. Section 9.4.1 explores associative classification, where association rules are generated from frequent patterns and used for classification. The general idea is that we can search for strong associations between frequent patterns (conjunctions of attribute–value pairs) and class labels. Section 9.4.2 explores discriminative frequent pattern–based classification, where frequent patterns serve as combined features, which are considered in addition to single features when building a classification model. Because frequent patterns explore highly confident associations among multiple attributes, frequent pattern–based classification may overcome some constraints introduced by decision tree induction, which considers only one attribute at a time. Studies have shown many frequent pattern–based classification methods to have greater accuracy and scalability than some traditional classification methods such as C4.5.
9.4.1. Associative Classification
In this section, you will learn about associative classification. The methods discussed are CBA, CMAR, and CPAR.
Before we begin, however, let's look at association rule mining in general. Association rules are mined in a two-step process consisting of frequent itemset mining followed by rule generation. The first step searches for patterns of attribute–value pairs that occur repeatedly in a data set, where each attribute–value pair is considered an item. The resulting attribute–value pairs form frequent itemsets (also referred to as frequent patterns). The second step analyzes the frequent itemsets to generate association rules. All association rules must satisfy certain criteria regarding their “accuracy” (or confidence) and the proportion of the data set that they actually represent (referred to as support). For example, the following is an association rule mined from a data set, D, shown with its confidence and support:
(9.21)
where ∧ represents a logical “AND.” We will say more about confidence and support later.
More formally, let D be a data set of tuples. Each tuple in D is described by n attributes, , and a class label attribute, . All continuous attributes are discretized and treated as categorical (or nominal) attributes. An item, p, is an attribute–value pair of the form (), where Ai is an attribute taking a value, v. A data tuple satisfies an item, , if and only if , where xi is the value of the i th attribute of X. Association rules can have any number of items in the rule antecedent (left side) and any number of