Skip main navigation

Evaluating 2-class classification

Threshold curves show different tradeoffs between error types. Ian Witten explains how the area under the ROC curve measures a classifier's accuracy.

In the last lesson we encountered a two-class dataset where the accuracy on one class was high and the accuracy on the other was low. Because the first class contained an overwhelming majority of the instances, the overall accuracy looked high. But life’s not so simple. In practice, there’s a tradeoff between the two error types – a different classifier may produce higher accuracy on one class at the expense of lower accuracy on the other. We need a more subtle way of evaluating classifiers that make this tradeoff explicit. Enter the ROC curve …

Note: Current versions of Weka have a different interface for outputting predictions from that shown in the video (at 2:42). Instead of selecting “Output predictions”, you now choose PlainText in the “Output predictions” selector. To get the output shown in the video, configure PlainText (double-click it) and set outputDistribution to True.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now