Data Mining - Mehmed Kantardzic [245]
R1: IF x is small, THEN y is above average.
R2: IF x is below average, THEN y is above average.
R3: IF x is above average, THEN y is high.
R4: IF x is high, THEN y is above average.
Figure 14.15. Granulation of a two-dimensional I/O space.
Figure 14.16. Selection of characteristic points in a granulated space.
Figure 14.17. Graphical representation of generated fuzzy rules and the resulting crisp approximation.
Note how the generated model misses the extremes that lie far from the existing rule centers. This behavior occurs because only one pattern per rule is used to determine the outcome of this rule. Even a combined approach would very much depend on the predefined granulation. If the function to be modeled has a high variance inside one rule, the resulting fuzzy rule model will fail to model this behavior.
For practical applications it is obvious, however, that using such a predefined, fixed grid results in a fuzzy model that will either not fit the underlying functions very well or consist of a large number of rules because of small granulation. Therefore, new approaches have been introduced that automatically determine the granulations of both input and output variables based on a given data set. We will explain the basic steps for one of these algorithms using the same data set from the previous example and the graphical representation of applied procedures.
1. Initially, only one MF is used to model each of the input variables as well as the output variable, resulting in one large rule covering the entire feature space. Subsequently, new MFs are introduced at points of maximum error (the maximum distance between data points and the obtained crisp approximation). Figure 14.18 illustrates this first step in which the crisp approximation is represented with a thick line and the selected point of maximal error with a triangle.
2. For the selected point of maximum error, new triangular fuzzy values for both input and output variables are introduced. Processes of granulation, determining fuzzy rules in the form of space regions, and crisp approximation are repeated for a space, with additional input and output fuzzy values for the second step—that means two fuzzy values for both input and output variables. The final results of the second step, for our example, are presented in Figure 14.19.
3. Step 2 is repeated until a maximum number of divisions (fuzzy values) is reached, or the approximation error remains below a certain threshold value. Figures 14.20 and 14.21 demonstrate two additional iterations of the algorithm for a data set. Here granulation was stopped after a maximum of four MFs was generated for each variable. Obviously this algorithm is able to model extremes much better than the previous one with a fixed granulation. At the same time, it has a strong tendency to favor extremes and to concentrate on outliers. The final set of fuzzy rules, using dynamically created fuzzy values Ax to Dx and Ay to Dy for input and output variables, is
R1: IF x is Ax, THEN y is Ay.
R2: IF x is Bx, THEN y is By.
R3: IF x is Cx, THEN y is Cy.
R4: IF x is Dx, THEN y is Dy.
Figure 14.18. The first step in automatically determining fuzzy granulation.
Figure 14.19. The second step (first iteration) in automatically determining granulation.
Figure 14.20. The second step (second iteration) in automatically determining granulation.
Figure 14.21. The second step (third iteration) in automatically determining granulation.
14.7 DATA MINING AND FUZZY SETS
There is a growing indisputable role of fuzzy set technology in the realm of data mining. In a data mining process, discovered models, learned concepts, or patterns of interest are often vague and