Data Mining - Mehmed Kantardzic [228]
That was the final processing step in the first iteration of the GA. The results, in the form of the new population CR1′ to CR4′, are used in the next iteration, which starts again with an evaluation process.
13.3.6 Evaluation (Second Iteration)
The process of evaluation is repeated in the new population. These results are given in Table 13.3.
TABLE 13.3. Evaluation of the Second Generation of Chromosomes
The process of optimization with additional iterations of the GA can be continued in accordance with Figure 13.3. We will stop here with a presentation of computational steps for our example, and give some additional analyses of results useful for a deeper understanding of a GA.
Although the search techniques used in the GA are based on many random parameters, they are capable of achieving a better solution by exploiting the best alternatives in each population. A comparison of sums, and average and max values from Tables 13.2 and 13.3
shows that the new, second population is approaching closer to the maximum of the function f(x). The best result obtained from the evaluation of chromosomes in the first two iterations is the chromosome CR3′ = 11011 and it corresponds to the feature’s value x = 27 (theoretically it is known that the maximum of f(x) is for x = 31, where f(x) reaches the value 961). This increase will not be obtained in each GA iteration, but on average, the final population is much closer to a solution after a large number of iterations. The number of iterations is one possible stopping criterion for the GA algorithm. The other possibilities for stopping the GA are when the difference between the sums in two successive iterations is less than the given threshold, when a suitable fitness value is achieved, and when the computation time is limited.
13.4 SCHEMATA
The theoretical foundations of GAs rely on a binary-string representation of solutions, and on the notation of a schema—a template allowing exploration of similarities among chromosomes. To introduce the concept of a schema, we have to first formalize some related terms. The search space Ω is the complete set of possible chromosomes or strings. In a fixed-length string l, where each bit (gene) can take on a value in the alphabet A of size k, the resulting size of the search space is kl. For example, in binary-coded strings where the length of the string is 8, the size of the search space is 28 = 256. A string in the population S is denoted by a vector x ∈ Ω. So, in the previously described example, x would be an element of {0, 1}8. A schema is a similarity template that defines a subset of strings with fixed values in certain positions.
A schema is built by introducing a don’t care symbol (*) into the alphabet of genes. Each position in the scheme can take on the values of the alphabet (fixed positions) or a “don’t care” symbol. In the binary case, for example, the schemata of the length l are defined as H ∈ {0, 1, *}l. A schema represents all the strings that match it on all positions other than “*”. In other words, a schema defines a subset of the search space, or a hyperplane partition of this search space. For example, let us consider the strings and schemata of the length ten. The schema
matches two strings
and the schema
matches four strings
Of course, the schema
represents one string only, and the schema
represents all strings of length 10. In general, the total number of possible schemata is (k + 1)l, where k is the number of symbols in the alphabet and l is the length of the string. In the binary example of coding strings, with a length of 10, it is (2+1)10