Data Mining - Mehmed Kantardzic [134]
Apart from reducing the neighborhood, it has also been found that quicker convergence of the SOM algorithm is obtained if the adaptation rate of nodes in the network is reduced over time. Initially the adaptation rate should be high to produce coarse clustering of nodes. Once this coarse representation has been produced, however, the adaptation rate is reduced so that smaller changes to the weight vectors are made at each node and regions of the map become fine-tuned to the input-training vectors. Therefore, every node within the BMU’s neighborhood including the BMU has its weight vector adjusted through the learning process. The previous equation for weight factors correction hi(t) may include an exponential decrease of “winner’s influence” introducing α(t) also as a monotonically decreasing function.
The number of output neurons in an SOM (i.e., map size) is important to detect the deviation of the data. If the map size is too small, it might not explain some important differences that should be detected between input samples. Conversely, if the map size is too big, the differences are too small. In practical applications, if there is no additional heuristics, the number of output neurons in an SOM can be selected using iterations with different SOM architectures.
The main advantages of SOM technology are as follows: presented results are very easy to understand and interpret; technology is very simple for implementation; and most important, it works well in many practical problems. Of course, there are also some disadvantages. SOMs are computationally expensive; they are very sensitive to measure of similarity; and finally, they are not applicable for real-world data sets with missing values. There are several possible improvements in implementations of SOMs. To reduce the number of iterations in a learning process, good initialization of weight factors is essential. Principal components of input data can make computation of the SOM orders of magnitude faster. Also, practical experience shows that hexagonal grids give output results with a better quality. Finally, selection of distance measure is important as in any clustering algorithm. Euclidean distance is almost standard, but that does not mean that it is always the best. For an improved quality (isotropy) of the display, it is advisable to select the grid of the SOM units as hexagonal.
The SOMs have been used in large spectrum of applications such as automatic speech recognition, clinical data analysis, monitoring of the condition of industrial plants and processes, classification from satellite images, analysis of genetic information, analysis of electrical signals from the brain, and retrieval from large document collections. Illustrative examples are given in Figure 7.18.
Figure 7.18. SOM applications. (a) Drugs binding to human cytochrome; (b) interest rate classification; (c) analysis of book-buying behavior.
7.8 REVIEW QUESTIONS AND PROBLEMS
1. Explain the fundamental differences between the design of an ANN and “classical” information-processing systems.
2. Why is fault-tolerance property one of the most important characteristics and capabilities of ANNs?
3. What are the basic components of the neuron’s model?
4. Why are continuous functions such as log-sigmoid or hyperbolic tangent considered common activation functions in real-world applications of ANNs?
5. Discuss the differences between feedforward and recurrent neural networks.
6. Given a two-input neuron with the following parameters: bias b = 1.2, weight factors W = [w1, w2] = [3, 2], and input vector X = [−5, 6]T; calculate the neuron’s output for the following activation functions:
(a) a symmetrical hard limit
(b) a log-sigmoid
(c) a hyperbolic tangent
7. Consider a two-input neuron with the following weight factors W and input vector X:
We would like to have an output of 0.5.
(a) Is there a transfer function from Table 9.1 that will do the job if the bias is zero?
(b) Is there a bias that will do