Skip to 0 minutes and 11 seconds Hello again! In this lesson we’re going to look at an important new concept called “baseline accuracy”. We’re going to use a new dataset, the “diabetes” dataset. I’ve got Weka here, and I’m going to open diabetes.arff.
Skip to 0 minutes and 31 seconds There it is. Have a quick look at this dataset. The class is tested_negative or tested_positive for diabetes. We’ve got attributes like “preg”, which I think has to do with the number of times they’ve been pregnant; “age”, which is the age. Of course, we can learn more about this dataset by looking at the ARFF file itself. Here’s the diabetes dataset. You can see it’s diabetes in Pima Indians.
Skip to 1 minute and 11 seconds There’s a lot of information here.
Skip to 1 minute and 13 seconds The attributes: number of times pregnant, plasma, glucose concentration, diabetes pedigree function, and so on. I’m going to use percentage split. I’m going to try a few different classifiers. Let’s look at J48 first, our old friend J48.
Skip to 1 minute and 44 seconds We get 76% with J48.
Skip to 1 minute and 49 seconds I’m going to look at some other classifiers. You learn about these classifiers later on in this course, but right now we’re just going to look at a few. Look at NaiveBayes classifier in the “bayes” category, and run that. Here we get 77%, a little bit better, but probably not significant. Let’s choose, in the “lazy” category, IBk. Again, we’ll learn about this later on. Here we get 73%, quite a bit worse. We’ll use one final one, PART, “partial rules” in the “rules” category. Here we get 74%. We’ll learn about these classifiers later, but they are just different classifiers, alternative to J48.
Skip to 2 minutes and 43 seconds You can see that J48 and NaiveBayes are pretty good, probably about the same: the 1% difference between them probably isn’t significant. IBk and PART are probably about the same performance; again, 1% between them. There’s a fair gap, I guess, between those bottom two and the top two, which probably is significant. I’d like to think about these figures. 76%, is it good to get 76% accuracy? If we go back and look at this dataset, the class, we see that there are 500 negative instances and 268 positive instances.
Skip to 3 minutes and 21 seconds If you had to guess, you’d guess it would be “negative”, and you’d be right 500/768 – the sum of these two things, the total number of instances – you’d be right that fraction of the time, 500/768 if you always guess “negative”, and that works out to 65%. Actually, there’s a “rules” classifier called ZeroR, which does exactly that. The ZeroR classifier just looks for the most popular class and guesses that all the time. If I run this on the training set, that will give us the exact same number, 500/768, which is 65%. It’s a very, very simple, kind of trivial classifier, that always just guesses the most popular class.
Skip to 4 minutes and 15 seconds It’s OK to evaluate that on the training set, because it’s hardly using the training set at all to form the classifier. That’s what we would call the “baseline”. The baseline gives 65% accuracy, and J48 gives 76% accuracy. It’s significantly above the baseline, but not all that much above the baseline. It’s always good when you’re looking at these figures to consider what the very simplest kind of classifier, the baseline classifier, would get you. Sometimes, baseline might give you the best results. I’m going to open a dataset here. We’re not going to discuss this dataset. It’s a bit of a strange dataset, not really designed for this kind of classification. It’s called “supermarket”.
Skip to 5 minutes and 1 second I’m going to open “supermarket”, and without even looking at it I’m just going to apply a few schemes here. I’m going to apply ZeroR, and I get 64%. I’m going to apply J48, and I think I’ll use a percentage split for evaluation because it’s not fair to use the training set here. Now I get 63%. That’s worse than the baseline! If I try NaiveBayes (these are the ones I tried before) I get again 63%, worse than the baseline. If I choose IBk – this is going to take a little while here, it’s a rather slow
Skip to 5 minutes and 57 seconds scheme – here we are; it’s finished now: only 38%! That’s way, way worse than the baseline.
Skip to 6 minutes and 5 seconds We’ll just try PART, partial decision rules: here we get 63%. The upshot is that the baseline actually gave a better performance than any of these classifiers, and one of them was really atrocious compared with the baseline. This is because, for this dataset, the attributes are not really informative.
Skip to 6 minutes and 32 seconds The rule here is: don’t just apply Weka to a dataset blindly. You need to understand what’s going on. When you do apply Weka to a dataset, always make sure that you try the baseline classifier, ZeroR, before doing anything else. In general, simplicity is best. Always try simple classifiers before you try more complicated ones. Also, you should consider, when you get these small differences, whether the differences are likely to be significant. We saw 1% differences in the last lesson that were probably not at all significant. You should always try a simple baseline. You should look at the dataset. We shouldn’t blindly apply Weka to a dataset; we should try to understand what’s going on.
The diabetes dataset has several attributes and a class that is either tested_negative or tested_positive (for diabetes). With Percentage split evaluation (66% training set, 34% test set), J48 yields 76% correctly classified instances. You can try other classifiers such as NaiveBayes (77%), IBk (73%), PART (74%). These results can be compared with a simple classifier called a “baseline”; the ZeroR baseline yields 65%. But in other situations the baseline does equally well – and sometimes much better than – more sophisticated classifiers. Beware!
© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.