Skip main navigation

Generating good decision rules

Ian Witten explains how both PART and Ripper can make good rules. Ripper uses incremental error pruning followed by a complex optimization step.
11
In the last lesson, we looked at decision rules versus decision trees. They are kind of similar representations, but they are interestingly different in how they express things. We also looked at a bottom-up covering algorithm for generating rules called PRISM. You will realize that PRISM is not really a terribly good machine learning method. It’s not really meant for serious use. In this lesson, we’re going to look at a couple of schemes for rule learning and show how to use them in Weka. The first scheme we’re going to look at is called PART, and it’s a way of forming rules from partial decision trees.
49.3
It’s the basic separate and conquer algorithm: make a rule, remove the instances it covers, and continue creating rules for the remaining instances. To make a rule, PART builds a tree. It builds and prunes a decision tree for the current set of instances, and reads off the rule for the largest leaf – the most important rule, if you like. Then it throws the tree away, and carries on with the covering algorithm. It seems very wasteful, and I suppose perhaps it is a bit wasteful, but it turns out you can build just a partial tree – you don’t have to build a whole tree. That’s how PART works.
90
The second method is called RIPPER: in fact, in Weka it’s called JRip. It’s a basic incremental reduced-error pruning algorithm. There’s class of algorithms that go by this name. The idea is that PRISM is good at producing rules, but it produces exact rules. Typically, we want to produce rules that are not necessarily exact, but merely very good. What incremental reduced-error pruning does is to take the instances, the training set,
127
and split them into two sets, one called Grow and one called Prune, in the ratio 2:1. It uses the Grow set for growing rules, adding clauses to rules until you get a perfect rule. Then it uses the Prune set when you’re pruning rules, deleting the clauses from the rule until you’re left with a good rule. For each class, while there are instances of that class in both these sets, we’re going to use PRISM to create the best perfect rule for that class. Then we’re going to calculate the worth of the rule. Now, we need some measure of the “worth” of a rule.
171
There are different ways of measuring the worth of a rule, and different incremental reduced-error pruning algorithms do different things. For example, you might just use the success rate, or you might use some more complicated thing, perhaps even some entropy metric. Anyway, whatever you do, let’s assume you’ve got a way of measuring the worth of a rule. We calculate the worth of that rule, and then we omit the final condition – the last one that we added – and look at the worth of that. If it’s worthwhile, then we take away that final condition and carry on trying to remove conditions from the rule until we get an optimal version of the rule.
208.2
So we build up the rule on the Grow set, and then we prune it back on the Prune set until we get a rule whose worth is good. It turns out it’s better to prune backwards than to prune on the way forwards. Again, it sounds a bit wasteful, but it’s a good idea to build up the whole rule and then prune backwards. Then we just carry on. We select the rule with the largest worth, and we prune it and remove the instances it covers, carrying on with the basic covering algorithm. RIPPER follows this by a fiendishly complicated global optimization step that’s really detailed, really complex, not really very principled, but works really well.
249.7
I’m not going to tell you about that. It’s just not worthwhile – you’d never remember it, it’s just too hard to – I mean, I don’t remember it. It’s just really complicated. But this is the basic kind of incremental reduced-error pruning algorithm that it uses to generate the rule set in the first place. All right, let’s go to Weka. I’ve loaded the diabetes dataset. I’m going to try J48, PART, and JRip. So here we are in Weka.
277.3
Here’s the diabetes dataset: 768 instances. I go to Classify, and I’ve already run J48. This is the result from J48.
287.2
I’ve got a decision tree here, quite a complicated decision tree: it’s got 20 leaves and a total of 39 nodes in this tree. It gets 74% accuracy. PART produces a rule set that looks like this. See these rules – ”If plas <= 127 and mass <= 26.4, etc. then tested_negative” (in this dataset, there are two classes, negative and positive). There’s a rule for negative, and a rule for positive, and so on. This rule set has got 13 rules, involving 25 separate tests.
334.7
We get 75% accuracy. RIPPER does really well, 76% accuracy, there at the top. It has only 4 rules. Amazing, eh? In fact, going back to the slide, here are the results, and here are the 4 rules. Actually, RIPPER starts out by taking the majority class – in this case tested_negative – and leaving that to the end. So it only produces rules for the other classes, and then leaves the majority class for the default clause like this. So tested_positive is the smaller class, and tested_negative is the larger class. These are the rules it’s come up with. Only 4 rules, 9 tests, and the best performance of all. That’s pretty amazing. PART is quick and quite an elegant algorithm, really.
383
Repeatedly constructing decision trees and discarding them is less wasteful than it sounds. Incremental reduced-error pruning is a standard technique.
391.7
RIPPER does incremental reduced-error pruning followed by a global optimization step: it usually produces fewer rules than PART.

Here are a couple of schemes for rule learning. The first, called PART, is a way of forming rules from partial decision trees. The second, called Ripper (JRip in Weka), uses incremental reduced-error pruning, followed by a fiendishly complicated global optimization step that’s detailed, complex, unprincipled, but works well and often generates unbelievably small rule sets that do an excellent job. I’ll spare you the details – you really don’t want to know! Both methods are easy to use in Weka.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now