Skip main navigation

Cost-sensitive classification conclusions

Ian Witten discusses the effect of cost-sensitive evaluation and cost-sensitive classification on the credit-g dataset, using Naive Bayes and J48.

Even if you don’t do the exercise, you should look at the numbers and note that they support the following general conclusions for the credit-g dataset with a particular cost matrix.

  • Naive Bayes is generally better than J48 on this problem.

  • Using cost-sensitive classification or cost-sensitive learning always improves the result.

  • For Naive Bayes, cost-sensitive learning gives about the same result as cost-sensitive classification.

  • For J48, cost-sensitive learning gives a better result than cost-sensitive classification.

  • Bagging improves the result of cost-sensitive classification using J48.

(Note incidentally that the Experimenter differs from the Explorer in that it calculates the four elements of the confusion matrix, Num_false_positives etc, separately for each cross-validation fold and averages them, whereas the Explorer sums them over the entire dataset. Thus, since we are using 10-fold cross-validation in this activity, the numbers, and the costs, produced by the Experimenter are 10% of those produced by the Explorer. The Experimenter does this because a separate confusion matrix is needed for each fold in order to compute the standard deviation of each element and to do significance tests between corresponding elements produced by different classifiers.)

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now