Data Mining - Mehmed Kantardzic [41]
In the feature-ranking algorithm, one can expect a ranked list of features that are ordered according to a specific evaluation measure. A measure can be based on accuracy of available data, consistency, information content, distances between samples, and finally, statistical dependencies between features. These algorithms do not tell you what the minimum set of features for further analysis is; they indicate only the relevance of a feature compared with others. Minimum-subset algorithms, on the other hand, return a minimum feature subset and no differences are made among features in the subset—all have the same ranking. The features in the subset are relevant for the mining process; the others are irrelevant. In both types of algorithms, it is important to establish a feature-evaluation scheme: the way in which the features are evaluated and then ranked, or added to the selected subset.
Feature selection in general can be viewed as a search problem, with each state in the search space specifying a subset of the possible features. If, for example, a data set has three features, {A1, A2, A3}, and in the process of selecting features, the presence of a feature is coded with 1 and its absence with 0, then there should be a total of 23 reduced-feature subsets coded with {0, 0, 0}, {1, 0, 0}, {0, 1, 0}, {0, 0, 1}, {1, 1, 0}, {1, 0, 1}, {0, 1, 1}, and {1, 1, 1}. The problem of feature selection is relatively trivial if the search space is small, since we can analyze all subsets in any order and a search will get completed in a short time. However, the search space is usually not small. It is 2N where the number of dimensions N in typical data-mining applications is large (N > 20). This makes the starting point and the search strategy very important. An exhaustive search of all subsets of features very often is replaced with some heuristic search procedures. With knowledge of the problem, these procedures find near-optimal subsets of features that further improve the quality of the data-mining process.
The objective of feature selection is to find a subset of features with data-mining performances comparable to the full set of features. Given a set of features m, the number of subsets to be evaluated as candidates for column reduction is finite, but still very large for iterative analysis through all cases. For practical reasons, an optimal search is not feasible, and simplifications are made to produce reasonable, acceptable, and timely results. If the reduction task is to create a subset, one possibility—the so-called bottom-up approach—starts with an empty set, and fills it in by choosing the most relevant features from the initial set of features. This process is based on some heuristic criteria for a feature evaluation. The top-down approach, on the other hand, begins with a full set of original features and then removes one-by-one those that are shown as irrelevant based on the selected heuristic evaluation measure. Additional approximations to the optimal approach are
1. to examine only promising subsets of features where promising subsets are usually obtained heuristically—this provides enough room for exploration of competing alternatives;
2. to substitute computationally simple distance measures for the error measures—this approximation will reduce the computation time yet give satisfactory results for comparison of subset alternatives;
3. to select features based only on subsets of large amounts of data, but the subsequent steps of data mining will be applied on the whole set.
The application of feature selection and reduction of data dimensionality may be used