Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [64]

By Root 788 0
There are three commonly used optimization approaches:

1. Stochastic Approximation (or Gradient Descent). Given an initial estimate of the values for the approximating functions of parameter w, the optimal parameter values are found by repeatedly updating them. In each step, while computing the gradient of the risk function, the updated values of the parameters cause a small movement in the direction of the steepest descent along the risk (error) function.

2. Iterative Methods. Parameter values w are estimated iteratively so that at each iteration the value of the empirical risk is decreased. In contrast to stochastic approximation, iterative methods do not use gradient estimates; instead, they rely on a particular form of approximating functions with a special iterative parameter.

3. Greedy Optimization. The greedy method is used when the set of approximating functions is a linear combination of some basic functions. Initially, only the first term of the approximating functions is used and the corresponding parameters optimized. Optimization corresponds to minimizing the differences between the training data set and the estimated model. This term is then held fixed, and the next term is optimized. The optimization process is repeated until values are found for all parameters w and for all terms in the approximating functions.

These typical optimization approaches and also other more specific techniques have one or more of the following problems:

1. Sensitivity to Initial Conditions. The final solution is very sensitive to the initial values of the approximation function parameters.

2. Sensitivity to Stopping Rules. Nonlinear approximating functions often have regions that are very flat, where some optimization algorithms can become “stuck” for a long time (for a large number of iterations). With poorly designed stopping rules these regions can be interpreted falsely as local minima by the optimization algorithm.

3. Sensitivity to Multiple Local Minima. Nonlinear functions may have many local minima, and optimization methods can find, at best, one of them without trying to reach global minimum. Various heuristics can be used to explore the solution space and move from a local solution toward a globally optimal solution.

Working with finite data sets, SLT reaches several conclusions that are important guidelines in a practical implementation of data-mining techniques. Let us briefly explain two of these useful principles. First, when solving a problem of inductive learning based on finite information, one should keep in mind the following general commonsense principle: Do not attempt to solve a specified problem by indirectly solving a harder general problem as an intermediate step. We are interested in solving a specific task, and we should solve it directly. Following SLT results, we stress that for estimation with finite samples, it is always better to solve a specific learning problem rather than attempt a general one. Conceptually, this means that posing the problem directly will then require fewer samples for a specified level of accuracy in the solution. This point, while obvious, has not been clearly stated in most of the classical textbooks on data analysis.

Second, there is a general belief that for inductive-learning methods with finite data sets, the best performance is provided by a model of optimal complexity, where the optimization is based on the general philosophical principle known as Occam’s razor. According to this principle, limiting the model complexity is more important than using true assumptions with all details. We should seek simpler models over complex ones and optimize the trade-off between model complexity and the accuracy of the model’s description and fit to the training data set. Models that are too complex and fit the training data very well or too simple and fit the data poorly are both not good models because they often do not predict future data very well. Model complexity is usually controlled in accordance with Occam’s razor principle by a priori knowledge.

Summarizing SLT,

Return Main Page Previous Page Next Page

®Online Book Reader