Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining - Mehmed Kantardzic [46]

By Root 816 0

features, and

(d) extraction of independent features that determine the model; and

3. to help the user visualize alternative results, which have fewer dimensions, to improve decision making.

3.3 RELIEF ALGORITHM

Reliefis a feature weight-based algorithm for feature selection inspired by the so-called instance-based learning. It relies on relevance evaluation of each feature given in a training data set, where samples are labeled (classification problems). The main idea of Relief is to compute a ranking score for every feature indicating how well this feature separates neighboring samples. The authors of the Relief algorithm, Kira and Rendell, proved that the ranking score is large for relevant features and small for irrelevant ones.

The core of the Relief algorithm is to estimate the quality of features according to how well their values distinguish between samples close to each other. Given training data S, the algorithm randomly selects subset of samples size m, where m is a user-defined parameter. Relief analyzes each feature based on a selected subset of samples. For each randomly selected sample X from a training data set, Relief searches for its two nearest neighbors: one from the same class, called nearest hit H, and the other one from a different class, called nearest miss M. An example for two-dimensional data is given in Figure 3.2.

Figure 3.2. Determining nearest hit H and nearest miss M samples.

The Relief algorithm updates the quality score W(Ai) for all feature Ai depending on the differences on their values for samples X, M, and H:

The process is repeated m times for randomly selected samples from the training data set and the scores W(Ai) are accumulated for each sample. Finally, using threshold of relevancy τ, the algorithm detects those features that are statistically relevant to the target classification, and these are the features with W(Ai) ≥ τ. We assume the scale of every feature is either nominal (including Boolean) or numerical (integer or real). The main steps of the Relief algorithm may be formalized as follows:

Initialize: W(Aj) = 0; i = 1, … , p (p is the number of features)

For i = 1 to m

Randomly select sample X from training data set S.

Find nearest hit H and nearest miss M samples.

For j = 1 to p

End.

Output: Subset of feature where W(Aj) ≥ τ

For example, if the available training set is given in Table 3.2 with three features (the last one of them is classification decision) and four samples, the scoring values W for the features F1 and F2 may be calculated using Relief:

TABLE 3.2. Training Data Set for Applying Relief Algorithm

In this example the number of samples is low and therefore we use all samples (m = n) to estimate the feature scores. Based on the previous results feature F1 is much more relevant in classification than feature F2. If we assign the threshold value of τ = 5, it is possible to eliminate feature F2 and build the classification model only based on feature F1.

Relief is a simple algorithm that relies entirely on a statistical methodology. The algorithm employs few heuristics, and therefore it is efficient—its computational complexity is O(mpn). With m, number of randomly selected training samples, being a user-defined constant, the time complexity becomes O(pn), which makes it very scalable to data sets with both a huge number of samples n and a very high dimensionality p. When n is very large, it is assumed that m n. It is one of the few algorithms that can evaluate features for real-world problems with large feature space and large number of samples. Relief is also noise-tolerant and is unaffected by feature interaction, and this is especially important for hard data-mining applications. However, Relief does not help with removing redundant features. As long as features are deemed relevant to the class concept, they will be selected even though many of them are highly correlated to each other.

One of the Relief problems is to pick a proper value of τ. Theoretically, using the so-called Cebysev’s inequality τ may be estimated:

Online Book Reader

Data Mining - Mehmed Kantardzic [46]

®Online Book Reader