Complexity_ A Guided Tour - Melanie Mitchell [63]
Robby’s situation in figure 9.1 is
To decide what to do next, Robby simply looks up this situation in his strategy table, and finds that the corresponding action is MoveWest. So he moves west. And crashes into a wall.
I never said this was a good strategy. Finding a good strategy isn’t our job; it’s the job of the genetic algorithm.
TABLE 9-1
I wrote the code for a genetic algorithm to evolve strategies for Robby. In my GA, each individual in the population is a strategy—a listing of the actions that correspond to each possible situation. That is, given a strategy such as the one in table 9.1, an individual to be evolved by the GA is just a listing of the 243 actions in the rightmost column, in the order given:
MoveNorth MoveEast MoveRandom PickUpCan … MoveWest … StayPut
The GA remembers that the first action in the string (here MoveNorth) goes with the first situation (“Empty Empty Empty Empty Empty”), the second action (here MoveEast) goes with the second situation (“Empty Empty Empty Empty Can”), and so on. In other words, I don’t have to explicitly list the situations corresponding to these actions; instead the GA remembers the order in which they are listed. For example, suppose Robby happened to observe that he was in the following situation:
I build into the GA the knowledge that this is situation number 2. It would look at the strategy table and see that the action in position 2 is MoveEast. Robby moves east, and then observes his next situation; the GA again looks up the corresponding action in the table, and so forth.
My GA is written in the programming language C. I won’t include the actual program here, but this is how it works.
1. Generate the initial population. The GA starts with an initial population of 200 random individuals (strategies).
A random population is illustrated in figure 9.2. Each individual strategy is a list of 243 “genes.” Each gene is a number between 0 and 6, which stands for an action (0 = MoveNorth, 1 = MoveSouth, 2 = MoveEast, 3 = MoveWest, 4 = StayPut, 5 = PickUp, and 6 = RandomMove). In the initial population, these numbers are filled in at random. For this (and all other probabilistic or random choices), the GA uses a pseudo-random-number generator. Repeat the following for 1,000 generations:
2. Calculate the fitness of each individual in the population. In my program, the fitness of a strategy is determined by seeing how well the strategy lets Robby do on 100 different cleaning sessions. A cleaning session consists of putting Robby at site 0, 0, and throwing down a bunch of cans at random (each site can contain at most one can; the probability of a given site containing a can is 50%). Robby then follows the strategy for 200 actions in each session. The score of the strategy in each session is the number of reward points Robby accumulates minus the total fines he incurs. The strategy’s fitness is its average score over 100 different cleaning sessions, each of which has a different configuration of cans.
FIGURE 9.2. Random initial population. Each individual consists of 243 numbers, each of which is between 0 and 6, and each of which encodes an action. The location of a number in a string indicates to which situation the action corresponds.
3. Apply evolution to the current population of strategies to create a new population. That is, repeat the following until the new population has 200 individuals:
(a) Choose two parent individuals from the current population probabilistically based on fitness. That is, the higher a strategy’s fitness, the more chance it has to be chosen as a parent.
(b) Mate the two parents to create two children. That is, randomly choose a position at which to split the two number strings; form one child by taking the numbers before that position from parent A and after that position from parent B, and vice versa to form the second child.
(c) With a small probability, mutate numbers in each child. That is, with a small probability, choose one or more numbers and replace them each with a randomly generated number