Data Mining - Mehmed Kantardzic [232]
The crossover does not require any modification. We take advantage of the fact that all classifiers are of equal length. Therefore, to crossover two selected parents, say Q1 and Q2
we generate a random crossover-position point. Suppose that we crossover after the third character in the string as marked; then the offspring are
The strength of the offspring is an average (possibly weighted) of the parents’ strengths. Now the system is ready to continue its learning process: starting another cycle, accepting further positive and negative samples from the training set, and modifying the strengths of classifiers as a measure of fitness. We can note that the training data set is included in the learning process through evaluation of schema’s strengths for each iteration. We expect that the population of classifiers converges to some rules with very high strengths.
One of the possible implementations of the previous ideas is the GIL system, which moves the GA closer to the symbolic level—mainly by defining specialized operators that manipulate binary strings. Previous symbolic classifiers are translated into binary strings, where for each attribute a binary string of fixed length is generated. The length is equal to the number of possible values for the given attribute. In the string, the required value is set to 1 and all others are set to 0. For example, if attribute A1 has the value z, it is represented with the binary string 001 (0’s are for values x, y). If the value of some attribute is *, that means that all values are possible, so it is represented with value 1 in all positions of the binary string.
For our previous example, with six attributes and a total number of 17 different values for all attributes, the classifier symbolically represented as
can be transformed into the binary representation
where bars separate bit sets for each attribute. The operators of the GIL system are modeled on inductive reasoning, which includes various inductive operators such as RuleExchange, RuleCopy, RuleGeneralization, RuleSpecialization, RuleSplit, SelectorDrop, ReferenceChange, and ReferenceExtension. We discuss some of them in turn.
13.6.1 RuleExchange
The RuleExchange operator is similar to a crossover of the classical GA as it exchanges selected complex between two parent chromosomes. For example, two parents (two rules)
may produce the following offspring (new rules)
13.6.2 RuleGeneralization
This unary operator generalizes a random subset of complexes. For example, for a parent
and the second and third complexes selected for generalization, the bits are ORed and the following offspring is produced:
13.6.3 RuleSpecialization
This unary operator specializes a random subset of complexes. For example, for a parent
and the second and third complexes selected for specialization, the bits are ANDed, and the following offspring is produced:
13.6.4 RuleSplit
This operator acts on a single complex, splitting it into a number of complexes. For example, a parent
may produce the following offspring (the operator splits the second selector):
The GIL system is a complex, inductive-learning system based on GA principles. It requires a number of parameters, such as the probabilities of applying each operator. The process is iterative. At each iteration, all chromosomes are evaluated with respect to their completeness, consistency, and fitness