Data Mining - Mehmed Kantardzic [230]
It is natural to represent each solution of the problem, even if it is not optimal, as a permutation of the cities. The terminal city can be omitted in the representation since it should always be the same as the initial city. For the computation of the total distance of each tour, the terminal city must be counted.
By representing each solution as a permutation of the cities, each city will be visited exactly once. Not every permutation, however, represents a valid solution, since some cities are not directly connected (e.g., A and E in Fig. 10.6). One practical approach is to assign an artificially large distance between cities that are not directly connected. In this way, invalid solutions that contain consecutive, nonadjacent cities will disappear, and all solutions will be allowed.
Our objective here is to minimize the total distance of each tour. We can select different fitness functions that will reflect this objective. For example, if the total distance is s, then f(s) could be a simple f(s) = s if we minimize the fitness function; alternatives for the maximization of a fitness function are f(s) = 1/s, f(s) = 1/s2, f(s) = K − s, where K is a positive constant that makes f(s) ≥ 0. There is no general formula to design the best fitness function. But, when we do not adequately reflect the goodness of the solution in the fitness function, finding an optimal or near-optimal solution will not be effective.
When dealing with permutations as solutions, simple crossover operations will result in invalid solutions. For example, for the problem in Figure 10.6, a crossover of two solutions after the third position in the strings
will produce new strings
which are invalid solutions because they do not represent permutations of initial elements in the strings. To avoid this problem, a modified crossover operation is introduced that directly operates on permutations and still gives permutations. This is a partially matched crossover (PMC) operation. It can be used not only for the TSPs, but also for any other problems that involve permutations in a solution’s representation. We illustrate the effects of the PMC operation by an example. Assume that two solutions are given as permutations of the same symbols, and suppose that the PMC is a two-point operation. Selecting two strings and two random crossing points is the first step in the process of applying the PMC operation.
The substrings between crossing points are called matching sections. In our example, we have two elements in the matching sections: E B for the first string and C D for the second one. The crossover operation requires an exchange of the symbols E with C, denoted as an ordered pair (E, C), and B with D, represented as (B, D). The next step in the PMC operation is to permute each of these two-element permutations in each string. In other words, it is necessary to exchange the places for pairs (E, C) and (B, D) in both strings. The result of (E, C) changes in the first string is A D C B E, and after the second pair (B, D) has been changed, the final version of the first string is A B C D E. The second string after application of the same permutations will become A C E D B first, and then A C E B D finally. If we analyze the two strings obtained by PMC operation
we can see that middle parts of the strings were really exchanged as in a standard crossover operation. On the other hand, the two new strings remain as valid permutations since the symbols are actually permuted within each string.
The other steps of a GA applied to the TSP are unchanged. A GA based on the above operator outperforms a random search for TSP, but still leaves much room for improvement. Typical results from the algorithm, when applied to 100 randomly generated cities gave after 20,000 generations, a value of the whole tour 9.4% above minimum.
13.6 MACHINE LEARNING USING GAs
Optimization problems are one of the most common application categories of GAs. In general, an optimization