Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining - Mehmed Kantardzic [232]

By Root 829 0

with respect to the fitness function of the rules to training data set. The operators used in this learning technique are, again, mutation and crossover. However, some modifications are necessary for mutation. Let us consider the first attribute A1 with its domain {x, y, z, *}. Thus, when mutation is called, we would change the mutated character (code) to one of the other three values that have equal probability. The strength of the offspring is usually the same as that of its parents. For example, if mutation is the rule Q3 on the randomly selected position 2, replacing the value y with a randomly selected value *, the new mutated classifier will be

The crossover does not require any modification. We take advantage of the fact that all classifiers are of equal length. Therefore, to crossover two selected parents, say Q1 and Q2

we generate a random crossover-position point. Suppose that we crossover after the third character in the string as marked; then the offspring are

The strength of the offspring is an average (possibly weighted) of the parents’ strengths. Now the system is ready to continue its learning process: starting another cycle, accepting further positive and negative samples from the training set, and modifying the strengths of classifiers as a measure of fitness. We can note that the training data set is included in the learning process through evaluation of schema’s strengths for each iteration. We expect that the population of classifiers converges to some rules with very high strengths.

One of the possible implementations of the previous ideas is the GIL system, which moves the GA closer to the symbolic level—mainly by defining specialized operators that manipulate binary strings. Previous symbolic classifiers are translated into binary strings, where for each attribute a binary string of fixed length is generated. The length is equal to the number of possible values for the given attribute. In the string, the required value is set to 1 and all others are set to 0. For example, if attribute A1 has the value z, it is represented with the binary string 001 (0’s are for values x, y). If the value of some attribute is *, that means that all values are possible, so it is represented with value 1 in all positions of the binary string.

For our previous example, with six attributes and a total number of 17 different values for all attributes, the classifier symbolically represented as

can be transformed into the binary representation

where bars separate bit sets for each attribute. The operators of the GIL system are modeled on inductive reasoning, which includes various inductive operators such as RuleExchange, RuleCopy, RuleGeneralization, RuleSpecialization, RuleSplit, SelectorDrop, ReferenceChange, and ReferenceExtension. We discuss some of them in turn.

13.6.1 RuleExchange

The RuleExchange operator is similar to a crossover of the classical GA as it exchanges selected complex between two parent chromosomes. For example, two parents (two rules)

may produce the following offspring (new rules)

13.6.2 RuleGeneralization

This unary operator generalizes a random subset of complexes. For example, for a parent

and the second and third complexes selected for generalization, the bits are ORed and the following offspring is produced:

13.6.3 RuleSpecialization

This unary operator specializes a random subset of complexes. For example, for a parent

and the second and third complexes selected for specialization, the bits are ANDed, and the following offspring is produced:

13.6.4 RuleSplit

This operator acts on a single complex, splitting it into a number of complexes. For example, a parent

may produce the following offspring (the operator splits the second selector):

The GIL system is a complex, inductive-learning system based on GA principles. It requires a number of parameters, such as the probabilities of applying each operator. The process is iterative. At each iteration, all chromosomes are evaluated with respect to their completeness, consistency, and fitness

Online Book Reader

Data Mining - Mehmed Kantardzic [232]

®Online Book Reader