Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [83]

By Root 821 0
threshold values. For example, starting from 0 and increasing by 0.05 until 1, we have 21 different threshold values. That will generate enough points to reconstruct an ROC curve.

Figure 4.34. Computing points on an ROC curve. (a) Threshold = 0.5; (b) threshold = 0.8.

When we are comparing two classification algorithms we may compare the measures as accuracy or F measure, and conclude that one model is giving better results than the other. Also, we may compare lift charts, ROI charts, or ROC curves, and if one curve is above the other we may conclude that a corresponding model is more appropriate. But in both cases we may not conclude that there are significant differences between models, or more important, that one model shows better performances than the other with statistical significance. There are some simple tests that could verify these differences. The first one is McNemar’s test. After testing models of both classifiers, we are creating a specific contingency table based on classification results on testing data for both models. Components of the contingency table are explained in Table 4.5.

TABLE 4.5. Contingency Table for McNemar’s Test

e00: Number of samples misclassified by both classifiers e01: Number of samples misclassified by classifier 1, but not classifier 2

e10: Number of samples misclassified by classifier 2, but not classifier 1 e11: Number of samples correctly classified by both classifier s

After computing the components of the contingency table, we may apply the χ2 statistic with one degree of freedom for the following expression:

McNemar’s test rejects the hypothesis that the two algorithms have the same error at the significance level α, if previous value is greater than χ2 α, 1. For example, for α = 0.05, χ2 0.05, 1 = 3.84.

The other test is applied if we compare two classification models that are tested with the K-fold cross-validation process. The test starts with the results of K-fold cross-validation obtained from K training/validation set pairs. We compare the error percentages in two classification algorithms based on errors in K validation sets that are recorded for two models as: and , i = 1, … , K.

The difference in error rates on fold i is . Then, we can compute:

We have a statistic that is t distributed with K-1 degrees of freedom, and the following test:

Thus, the K-fold cross-validation paired t-test rejects the hypothesis that two algorithms have the same error rate at significance level α, if previous value is outside interval (−tα/2,K-1, tα/2,K-1). For example, the threshold values could be for α = 0.05 and K = 10 or 30: t0.025, 9 = 2.26, and t0.025, 29 = 2.05.

Over time, all systems evolve. Thus, from time to time the model will have to be retested, retrained, and possibly completely rebuilt. Charts of the residual differences between forecasted and observed values are an excellent way to monitor model results.

4.9 90% ACCURACY: NOW WHAT?


Often forgotten in texts on data mining is a discussion of the deployment process. Any data-mining student may produce a model with relatively high accuracy over some small data set using the tools available. However, an experienced data miner sees beyond the creation of a model during the planning stages. There needs to be a plan created to evaluate how useful a data-mining model is to a business, and how the model will be rolled out. In a business setting the value of a data-mining model is not simply the accuracy, but how that model can impact the bottom line of a company. For example, in fraud detection, algorithm A may achieve an accuracy of 90% while algorithm B achieves 85% on training data. However, an evaluation of the business impact of each may reveal that algorithm A would likely underperform algorithm B because of larger number of very expensive false negative cases. Additional financial evaluation may recommend algorithm B for the final deployment because with this solutions company saves more money. A careful analysis of the business impacts of data-mining decisions gives much greater insight of a data-mining

Return Main Page Previous Page Next Page

®Online Book Reader