Skip main navigation

Try some market basket analysis

Try this (instead of a quiz). Market basket analysis aims to discover interesting purchasing patterns in large datasets of transactional records. Typically these are the contents of individual shoppers’ baskets …

Learning association rules

Apriori’s strategy for generating association rules is to specify a minimum “confidence” and iteratively reduce “support” until enough rules are found. Doing this efficiently involves an interesting and subtle algorithm. …

Generating good decision rules

Here are a couple of schemes for rule learning. The first, called PART, is a way of forming rules from partial decision trees. The second, called Ripper (JRip in Weka), …

Association rules

Association rule learners find associations between attributes. Between any attributes: there’s no particular class attribute. Rules can predict any attribute, or indeed any combination of attributes. To find them we …

Adding PRISM to Weka

Before attempting the quiz that follows, you will need to download Weka’s PRISM package. We explained Weka’s “package” system in the previous course Data Mining with Weka. Here’s a refresher. …

Decision trees and rules

Any decision tree has an equivalent set of rules … and any rule set has an equivalent decision tree. So they’re the same? It’s not so simple. If you read …

Multinomial Naive Bayes

Naive Bayes has three flaws when applied to document classification. First, a word’s non-appearance counts just as much its appearance, whereas surely a document’s class is determined by the words …

Evaluating 2-class classification

In the last lesson we encountered a two-class dataset where the accuracy on one class was high and the accuracy on the other was low. Because the first class contained …

How can you discretize numeric attributes?

Converting numeric attributes to nominal is called “discretization”. But wait! Why would you want to do this? Well, for one thing, some machine learning methods only work on nominal attributes. …

Document classification

A document classification problem can be represented in the ARFF format with two attributes per instance, the document text in a “string” attribute and the document class as a nominal …

Discretization in J48

J48 effectively discretizes numeric attributes as it goes along, which sounds good because split points are chosen in a local context, taking into account just the instances that reach that …

How do you classify documents?

Document classification is a popular and important application of data mining. But how can it be done? Weka allows string attributes, and it’s simple to load an entire document into …

Supervised discretization

“Supervised” discretization methods take the class into account when setting discretization boundaries, which is often a very good thing to do. But wait! You mustn’t use the test data when …

Discretizing numeric attributes

Discretizing is transforming numeric attributes to nominal. You might want to do that in order to use a classification method that can’t handle numeric attributes (unlikely), or to produce better …

Can Weka process big data?

What is “big data” anyway? The term is typically used to refer to data sets that are so large that data mining tools have difficulty in dealing with them. But …