Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining - Mehmed Kantardzic [74]

By Root 675 0

of parameters for SVM is very important and the quality of results depends on these parameters. Two most important parameters are cost C and parameter γ for Gaussian kernel. It is not known beforehand which C and σ are the best for one problem; consequently, some kind of parameter search must be done. The goal is to identify good (C; σ) so that the classifier can accurately predict unknown data (i.e., testing data). Note that it may not be required to achieve high-training accuracy. Small cost C is appropriate for close to linear separable samples. If we select small C for nonlinear classification problem, it will cause under-fitted learning. Large C for nonlinear problems is appropriate, but not too much because the classification margin will become very thin resulting in over-fitted learning. Similar analysis is for Gaussian kernel σ parameter. Small σ will cause close to linear kernel with no significant transformation in future space and less flexible solutions. Large σ generates extremely complex nonlinear classification solution.

The experience in many real-world SVM applications suggests that, in general, RBF model is a reasonable first choice. The RBF kernel nonlinearly maps samples into a higher dimensional space, so, unlike the linear kernel, it can handle the case when the relation between classes is highly nonlinear. The linear kernel should be treated a special case of RBF. The second reason for RBF selection is the number of hyper-parameters that influences the complexity of model selection. For example, polynomial kernels have more parameters than the RBF kernel and the tuning proves is much more complex and time-consuming. However, there are some situations where the RBF kernel is not suitable, and one may just use the linear kernel with extremely good results. The question is when to use the linear kernel as a first choice. If the number of features is large, one may not need to map data to a higher dimensional space. Experiments showed that the nonlinear mapping does not improve the SVM performance. Using the linear kernel is good enough, and C is the only tuning parameter. Many microarray data in bioinformatics and collection of electronic documents for classification are examples of this data set type. As the number of features is smaller, and the number of samples increases, SVM successfully maps data to higher dimensional spaces using nonlinear kernels.

One of the methods for finding optimal parameter values for an SVM is a grid search. The algorithm tries values of each parameter across the specified search range using geometric steps. Grid searches are computationally expensive because the model must be evaluated at many points within the grid for each parameter. For example, if a grid search is used with 10 search intervals and an RBF kernel function is used with two parameters (C and σ), then the model must be evaluated at 10*10 = 100 grid points, that is, 100 iterations in a parameter-selection process.

At the end we should highlight main strengths of the SVM methodology. First, a training process is relatively easy with a small number of parameters and the final model is never presented with local optima, unlike some other techniques. Also, SVM methodology scales relatively well to high-dimensional data and it represents a trade-off between a classifier’s complexity and accuracy. Nontraditional data structures like strings and trees can be used as input samples to SVM, and the technique is not only applicable to classification problems, it is also accommodated for prediction. Weaknesses of SVMs include computational inefficiency and the need to experimentally choose a “good” kernel function.

The SVM methodology is becoming increasingly popular in the data-mining community. Software tools that include SVM are becoming more professional, user-friendly, and applicable to many real-world problems where data sets are extremely large. It has been shown that SVM outperforms other techniques such as logistic regression or artificial neural networks on a wide variety of real-world problems. Some of

Online Book Reader

Data Mining - Mehmed Kantardzic [74]

®Online Book Reader