Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [217]

By Root 1563 0
did early work in probability and decision theory during the 18th century. Let X be a data tuple. In Bayesian terms, X is considered “evidence.” As usual, it is described by measurements made on a set of n attributes. Let H be some hypothesis such as that the data tuple X belongs to a specified class C. For classification problems, we want to determine , the probability that the hypothesis H holds given the “evidence” or observed data tuple X. In other words, we are looking for the probability that tuple X belongs to class C, given that we know the attribute description of X.

is the posterior probability, or a posteriori probability, of H conditioned on X. For example, suppose our world of data tuples is confined to customers described by the attributes age and income, respectively, and that X is a 35-year-old customer with an income of $40,000. Suppose that H is the hypothesis that our customer will buy a computer. Then reflects the probability that customer X will buy a computer given that we know the customer's age and income.

In contrast, is the prior probability, or a priori probability, of H. For our example, this is the probability that any given customer will buy a computer, regardless of age, income, or any other information, for that matter. The posterior probability, , is based on more information (e.g., customer information) than the prior probability, , which is independent of X.

Similarly, is the posterior probability of X conditioned on H. That is, it is the probability that a customer, X, is 35 years old and earns $40,000, given that we know the customer will buy a computer.

is the prior probability of X. Using our example, it is the probability that a person from our set of customers is 35 years old and earns $40,000.

“How are these probabilities estimated?” , , and may be estimated from the given data, as we shall see next. Bayes’ theorem is useful in that it provides a way of calculating the posterior probability, , from , , and . Bayes’ theorem is

(8.10)

Now that we have that out of the way, in the next section, we will look at how Bayes’ theorem is used in the naïve Bayesian classifier.

8.3.2. Naïve Bayesian Classification

The naïve Bayesian classifier, or simple Bayesian classifier, works as follows:

1. Let D be a training set of tuples and their associated class labels. As usual, each tuple is represented by an n-dimensional attribute vector, , depicting n measurements made on the tuple from n attributes, respectively, .

2. Suppose that there are m classes, . Given a tuple, X, the classifier will predict that X belongs to the class having the highest posterior probability, conditioned on X. That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if

Thus, we maximize . The class Ci for which is maximized is called the maximum posteriori hypothesis. By Bayes’ theorem (Eq. 8.10),

(8.11)

3. As is constant for all classes, only needs to be maximized. If the class prior probabilities are not known, then it is commonly assumed that the classes are equally likely, that is, , and we would therefore maximize . Otherwise, we maximize . Note that the class prior probabilities may be estimated by /, where is the number of training tuples of class Ci in D.

4. Given data sets with many attributes, it would be extremely computationally expensive to compute . To reduce computation in evaluating , the naïve assumption of class-conditional independence is made. This presumes that the attributes’ values are conditionally independent of one another, given the class label of the tuple (i.e., that there are no dependence relationships among the attributes). Thus,

(8.12)

We can easily estimate the probabilities from the training tuples. Recall that here xk refers to the value of attribute Ak for tuple X. For each attribute, we look at whether the attribute is categorical or continuous-valued. For instance, to compute , we consider the following:

(a) If Ak is categorical, then is the number of tuples of class Ci in D having the value xk for Ak, divided by ,

Return Main Page Previous Page Next Page

®Online Book Reader