Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [219]

By Root 1666 0
who are students and for which buys_computer = yes (which contributes to and the number of customers who are students and for which buys_computer = no (which contributes to .

But what if, say, there are no training tuples representing students for the class , resulting in ? In other words, what happens if we should end up with a probability value of zero for some ? Plugging this zero value into Eq. (8.12) would return a zero probability for , even though, without the zero probability, we may have ended up with a high probability, suggesting that X belonged to class Ci! A zero probability cancels the effects of all the other (posteriori) probabilities (on Ci) involved in the product.

There is a simple trick to avoid this problem. We can assume that our training data-base, D, is so large that adding one to each count that we need would only make a negligible difference in the estimated probability value, yet would conveniently avoid the case of probability values of zero. This technique for probability estimation is known as the Laplacian correction or Laplace estimator, named after Pierre Laplace, a French mathematician who lived from 1749 to 1827. If we have, say, q counts to which we each add one, then we must remember to add q to the corresponding denominator used in the probability calculation. We illustrate this technique in Example 8.5.

Using the Laplacian correction to avoid computing probability values of zero

Suppose that for the class buys_computer = yes in some training database, D, containing 1000 tuples, we have 0 tuples with income = low, 990 tuples with income = medium, and 10 tuples with income = high. The probabilities of these events, without the Laplacian correction, are 0, 0.990 (from 990/1000), and 0.010 (from 10/1000), respectively. Using the Laplacian correction for the three quantities, we pretend that we have 1 more tuple for each income-value pair. In this way, we instead obtain the following probabilities (rounded up to three decimal places):

respectively. The “corrected” probability estimates are close to their “uncorrected” counterparts, yet the zero probability value is avoided.

8.4. Rule-Based Classification


In this section, we look at rule-based classifiers, where the learned model is represented as a set of IF-THEN rules. We first examine how such rules are used for classification (Section 8.4.1). We then study ways in which they can be generated, either from a decision tree (Section 8.4.2) or directly from the training data using a sequential covering algorithm (Section 8.4.3).

8.4.1. Using IF-THEN Rules for Classification

Rules are a good way of representing information or bits of knowledge. A rule-based classifier uses a set of IF-THEN rules for classification. An IF-THEN rule is an expression of the form

IF condition THEN conclusion.

An example is rule ,

The “IF” part (or left side) of a rule is known as the rule antecedent or precondition. The “THEN” part (or right side) is the rule consequent. In the rule antecedent, the condition consists of one or more attribute tests (e.g., age = youth and student = yes) that are logically ANDed. The rule's consequent contains a class prediction (in this case, we are predicting whether a customer will buy a computer). can also be written as

If the condition (i.e., all the attribute tests) in a rule antecedent holds true for a given tuple, we say that the rule antecedent is satisfied (or simply, that the rule is satisfied) and that the rule covers the tuple.

A rule R can be assessed by its coverage and accuracy. Given a tuple, X, from a class-labeled data set, D, let be the number of tuples covered by R; be the number of tuples correctly classified by R; and be the number of tuples in D. We can define the coverage and accuracy of R as

(8.16)

(8.17)

That is, a rule's coverage is the percentage of tuples that are covered by the rule (i.e., their attribute values hold true for the rule's antecedent). For a rule's accuracy, we look at the tuples that it covers and see what percentage of them the rule can correctly classify.

Return Main Page Previous Page Next Page

®Online Book Reader