Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £35.99 £24.99. New subscribers only T&Cs apply

Find out more

Association rules

Instead of predicting a "class", association rules describe relations between any of the attributes. Ian Witten explains how performance is measured.
11
Association rules are about finding associations between attributes. Between any attributes. There’s no particular class attribute. Rules can predict any attribute, or indeed any combination of attributes. For this we need a different kind of algorithm. The one that we use in Weka, the most popular association rule algorithm, is called Apriori. I don’t know if you remember the weather data from Data Mining with Weka. Here’s this little dataset with 14 instances and a few attributes. Well, here are some association rules. “If outlook=overcast, then play=yes.” If you look at that, there are 4 “overcast” instances, and it’s “yes” for all of
53.8
them: that rule is 100% correct. “If temperature=cool, then humidity=normal”; that’s also 100% correct. “If outlook=sunny and play=no, then humidity=high.” We don’t have to predict “play” or indeed any particular attribute. If you look at rule #4, “outlook=sunny and play=no,” the first 2 instances satisfy that rule, and there are no other instances that satisfy that rule. So it’s 100% correct, but it only covers 2 instances. There are lots of 100% correct rules for the weather data. I think there are 336 rules that are 100% correct. Somehow we need to discriminate between these rules. The way we’re going to do this is to look at the “support”, the number of instances that satisfy a rule.
103.7
The “confidence” is the proportion of instances for which the conclusion holds, and the “support” is the number of instances that satisfy a rule. Here I’ve got the same rules. They all have 100% confidence, but they’ve got different degrees of support, different numbers of instances. We’re looking for high support/high confidence rules, but we don’t really want to specify 100% confidence and look for all of those rules, because, like I said, there are hundreds of them and a lot of them have very low support. Typically what we do is specify a minimum degree of confidence and seek the rules with the greatest support with that minimum confidence. I want to introduce you to the idea of an “itemset”.
149.3
An itemset is a set of attribute-value pairs, like “humidity=normal and windy=false and play=yes”. An itemset has got a certain support given a dataset. Here there are 4 instances in the dataset that are in that itemset. We can take that itemset and permute it in 7 different ways to produce rules, all of which have a support of 4. “If humidity=normal and windy=false than play=yes” has a support of 4 and a confidence of 4/4 – that’s 100% – because all of the instances for which humidity=normal and windy=false have play=yes. As we go down this list of rules, we get a lower degree of confidence.
197.1
The last rule, for example, doesn’t have anything on the left-hand side: “anything implies humidity=normal, windy=false and play=yes” has a support of 4, but there are 14 instances that satisfy the left-hand side. All of the instances satisfy the left-hand side, so the confidence is 4/14. You can see that as you go down this list of rules, the confidence is decreasing from 100% through 4/6 (67%) down to quite a low value, 4/14. What Apriori does is generate high-support itemsets. Then, given an itemset, it gets all the rules from it, and just takes those with more than a minimum specified degree of confidence.
250
The strategy is to iteratively reduce the minimum support until the required number of rules is found with a given minimum confidence. That’s it for this lesson. There are far more association rules than classification rules. We need different techniques. The “support” and “confidence” are two important measures. Apriori is the standard algorithm, and I just want to show you that algorithm over here in Weka. In order to use Apriori, I go to the Associate panel. There are a few association rule algorithms, of which by far the most popular is Apriori; that’s the default one. Then I just run that to get association rules.
293.2
We want to specify the minimum confidence value and seek rules with the most support, and the details of that are in the next lesson.

Association rule learners find associations between attributes. Between any attributes: there’s no particular class attribute. Rules can predict any attribute, or indeed any combination of attributes. To find them we need a different kind of algorithm. “Support” and “confidence” are two measures of a rule that are used to evaluate them, and rank them. The most popular association rule learner, and the one used in Weka, is called Apriori.

This article is from the free online

More Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now