Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [307]

By Root 1370 0
until the clustering cannot be improved, that is, until the clustering converges or the change is sufficiently small (less than a preset threshold). Each iteration also consists of two steps:

■ The expectation step assigns objects to clusters according to the current fuzzy clustering or parameters of probabilistic clusters.

■ The maximization step finds the new clustering or parameters that maximize the SSE in fuzzy clustering (Eq. 11.4) or the expected likelihood in probabilistic model-based clustering.

Fuzzy clustering using the EM algorithm

Consider the six points in Figure 11.2, where the coordinates of the points are also shown. Let's compute two fuzzy clusters using the EM algorithm.

Figure 11.2 Data set for fuzzy clustering.

We randomly select two points, say and , as the initial centers of the two clusters. The first iteration conducts the expectation step and the maximization step as follows.

In the E-step, for each point we calculate its membership degree in each cluster. For any point, o, we assign o to c1 and c2 with membership weights

respectively, where is the Euclidean distance. The rationale is that, if o is close to c1 and is small, the membership degree of o with respect to c1 should be high. We also normalize the membership degrees so that the sum of degrees for an object is equal to 1.

For point a, we have and . That is, a exclusively belongs to c1. For point b, we have and . For point c, we have and . The degrees of membership of the other points are shown in the partition matrix in Table 11.3.

Table 11.3 Intermediate Results from the First Three Iterations of Example 11.7's EM Algorithm

IterationE-StepM-Step

1

2

3

In the M-step, we recalculate the centroids according to the partition matrix, minimizing the SSE given in Eq. (11.4). The new centroid should be adjusted to

(11.12)

where .

In this example,

and

We repeat the iterations, where each iteration contains an E-step and an M-step. Table 11.3 shows the results from the first three iterations. The algorithm stops when the cluster centers converge or the change is small enough.

“How can we apply the EM algorithm to compute probabilistic model-based clustering?” Let's use a univariate Gaussian mixture model (Example 11.6) to illustrate.

Using the EM algorithm for mixture models

Given a set of objects, , we want to mine a set of parameters, , such that in Eq. (11.11) is maximized, where are the mean and standard deviation, respectively, of the j th univariate Gaussian distribution, .

We can apply the EM algorithm. We assign random values to parameters Θ as the initial values. We then iteratively conduct the E-step and the M-step as follows until the parameters converge or the change is sufficiently small.

In the E-step, for each object, , we calculate the probability that oi belongs to each distribution, that is,

(11.13)

In the M-step, we adjust the parameters Θ so that the expected likelihood in Eq. (11.11) is maximized. This can be achieved by setting

(11.14)

and

(11.15)


In many applications, probabilistic model-based clustering has been shown to be effective because it is more general than partitioning methods and fuzzy clustering methods. A distinct advantage is that appropriate statistical models can be used to capture latent clusters. The EM algorithm is commonly used to handle many learning problems in data mining and statistics due to its simplicity. Note that, in general, the EM algorithm may not converge to the optimal solution. It may instead converge to a local maximum. Many heuristics have been explored to avoid this. For example, we could run the EM process multiple times using different random initial values. Furthermore, the EM algorithm can be very costly if the number of distributions is large or the data set contains very few observed data points.

11.2. Clustering High-Dimensional Data


The clustering methods we have studied so far work well when the dimensionality is not high, that is, having less than 10 attributes. There are, however, important applications of high

Return Main Page Previous Page Next Page

®Online Book Reader