Data Mining - Mehmed Kantardzic [26]
where the minimum and the maximum values for the feature v are computed on a set automatically, or they are estimated by an expert in a given domain. Similar transformation may be used for the normalized interval [−1, 1]. The automatic computation of min and max values requires one additional search through the entire data set, but computationally, the procedure is very simple. On the other hand, expert estimations of min and max values may cause unintentional accumulation of normalized values.
Standard Deviation Normalization.
Normalization by standard deviation often works well with distance measures but transforms the data into a form unrecognizable from the original data. For a feature v, the mean value mean(v) and the standard deviation sd(v) are computed for the entire data set. Then, for a case i, the feature value is transformed using the equation
For example, if the initial set of values of the attribute is v = {1, 2, 3}, then mean(v) = 2, sd(v) = 1, and the new set of normalized values is v* = {−1, 0, 1}.
Why not treat normalization as an implicit part of a data-mining method? The simple answer is that normalizations are useful for several diverse methods of data mining. Also very important is that the normalization is not a one-time or a one-phase event. If a method requires normalized data, available data should be initially transformed and prepared for the selected data-mining technique, but an identical normalization must be applied in all other phases of data-mining and with all new and future data. Therefore, the normalization parameters must be saved along with a solution.
2.3.2 Data Smoothing
A numeric feature, y, may range over many distinct values, sometimes as many as the number of training cases. For many data-mining techniques, minor differences among these values are not significant and may degrade the performance of the method and the final results. They may be considered as random variations of the same underlying value. Hence, it can be advantageous sometimes to smooth the values of the variable.
Many simple smoothers can be specified that average similar measured values. For example, if the values are real numbers with several decimal places, rounding the values to the given precision could be a simple smoothing algorithm for a large number of samples, where each sample has its own real value. If the set of values for the given feature F is {0.93, 1.01, 1.001, 3.02, 2.99, 5.03, 5.01, 4.98}, then it is obvious that smoothed values will be Fsmoothed = {1.0, 1.0, 1.0, 3.0, 3.0, 5.0, 5.0, 5.0}. This simple transformation is performed without losing any quality in a data set, and, at the same time, it reduces the number of different real values for the feature to only three.
Some of these smoothing algorithms are more complex, and they are explained in Section 3.2. Reducing the number of distinct values for a feature means reducing the dimensionality of the data space at the same time. Reduced values are particularly useful for logic-based methods of data mining, as will be explained in Chapter 6. Smoothers in this case can be used to discretize continuous features into a set of features with binary true–false values.
2.3.3 Differences and Ratios
Even small changes to features can produce significant improvement in data-mining performances. The effects of relatively minor transformations of input or output features are particularly important in the specification of the data-mining goals. Two types of simple transformations, differences and ratios, could make improvements in goal specification, especially if they are applied to the output features.
These transformations sometimes produce better results than the simple, initial goal of predicting a number. In one application, for example, the objective is to move the controls for a manufacturing process to an optimal setting. But instead of optimizing the absolute magnitude specification